Picture for Hongtao Xie

Hongtao Xie

Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models

Add code
Mar 20, 2025
Viaarxiv icon

SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability

Add code
Mar 18, 2025
Viaarxiv icon

What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs

Add code
Feb 19, 2025
Viaarxiv icon

A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions

Add code
Dec 12, 2024
Viaarxiv icon

SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition

Add code
Nov 24, 2024
Viaarxiv icon

Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing

Add code
Nov 23, 2024
Figure 1 for Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing
Figure 2 for Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing
Figure 3 for Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing
Figure 4 for Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing
Viaarxiv icon

TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model

Add code
Oct 14, 2024
Figure 1 for TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model
Figure 2 for TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model
Figure 3 for TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model
Figure 4 for TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model
Viaarxiv icon

How Control Information Influences Multilingual Text Image Generation and Editing?

Add code
Jul 16, 2024
Viaarxiv icon

Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition

Add code
Jul 08, 2024
Viaarxiv icon

Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation

Add code
Jun 21, 2024
Figure 1 for Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation
Figure 2 for Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation
Figure 3 for Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation
Figure 4 for Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation
Viaarxiv icon