Picture for Bozhou Li

Bozhou Li

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Add code
Feb 04, 2026
Viaarxiv icon

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Add code
Feb 03, 2026
Viaarxiv icon

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

Add code
Feb 02, 2026
Viaarxiv icon

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models

Add code
Jan 27, 2026
Viaarxiv icon

GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

Add code
Dec 17, 2025
Viaarxiv icon

The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss

Add code
Dec 09, 2025
Figure 1 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Figure 2 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Figure 3 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Figure 4 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Viaarxiv icon

ID-Align: RoPE-Conscious Position Remapping for Dynamic High-Resolution Adaptation in Vision-Language Models

Add code
May 27, 2025
Viaarxiv icon

Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models

Add code
Oct 08, 2024
Figure 1 for Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models
Figure 2 for Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models
Figure 3 for Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models
Figure 4 for Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models
Viaarxiv icon

Are Bigger Encoders Always Better in Vision Large Models?

Add code
Aug 01, 2024
Viaarxiv icon

A Survey of Multimodal Large Language Model from A Data-centric Perspective

Add code
May 26, 2024
Figure 1 for A Survey of Multimodal Large Language Model from A Data-centric Perspective
Figure 2 for A Survey of Multimodal Large Language Model from A Data-centric Perspective
Figure 3 for A Survey of Multimodal Large Language Model from A Data-centric Perspective
Figure 4 for A Survey of Multimodal Large Language Model from A Data-centric Perspective
Viaarxiv icon