Picture for Deqiang Jiang

Deqiang Jiang

Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding

Add code
Jan 28, 2026
Viaarxiv icon

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Add code
Jan 27, 2026
Viaarxiv icon

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation

Add code
Oct 10, 2025
Viaarxiv icon

TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs

Add code
May 27, 2025
Viaarxiv icon

Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models

Add code
Oct 09, 2024
Figure 1 for Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models
Figure 2 for Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models
Figure 3 for Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models
Figure 4 for Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models
Viaarxiv icon

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

Add code
Jun 18, 2024
Figure 1 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 2 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 3 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 4 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Viaarxiv icon

HRVDA: High-Resolution Visual Document Assistant

Add code
Apr 10, 2024
Figure 1 for HRVDA: High-Resolution Visual Document Assistant
Figure 2 for HRVDA: High-Resolution Visual Document Assistant
Figure 3 for HRVDA: High-Resolution Visual Document Assistant
Figure 4 for HRVDA: High-Resolution Visual Document Assistant
Viaarxiv icon

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

Add code
Feb 29, 2024
Figure 1 for Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Figure 2 for Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Figure 3 for Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Figure 4 for Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Viaarxiv icon

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Add code
Dec 20, 2023
Figure 1 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Figure 2 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Figure 3 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Figure 4 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Viaarxiv icon

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

Add code
Sep 03, 2023
Figure 1 for Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Figure 2 for Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Figure 3 for Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Figure 4 for Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Viaarxiv icon