Picture for Zhi Yu

Zhi Yu

National Mobile Communications Research Laboratory, Southeast University, Nanjing, China

Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding

Add code
Nov 12, 2024
Viaarxiv icon

SAM-SP: Self-Prompting Makes SAM Great Again

Add code
Aug 22, 2024
Viaarxiv icon

WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation

Add code
Jul 22, 2024
Viaarxiv icon

ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

Add code
Jul 17, 2024
Viaarxiv icon

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

Add code
Apr 08, 2024
Viaarxiv icon

Less is More : A Closer Look at Multi-Modal Few-Shot Learning

Add code
Jan 10, 2024
Viaarxiv icon

LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training

Add code
Jan 03, 2024
Viaarxiv icon

Multi-View Fusion and Distillation for Subgrade Distresses Detection based on 3D-GPR

Add code
Aug 09, 2023
Viaarxiv icon

Translate the Beauty in Songs: Jointly Learning to Align Melody and Translate Lyrics

Add code
Mar 28, 2023
Viaarxiv icon

LORE: Logical Location Regression Network for Table Structure Recognition

Add code
Mar 07, 2023
Viaarxiv icon