Picture for Yunhang Shen

Yunhang Shen

Aligning Multimodal LLM with Human Preference: A Survey

Add code
Mar 18, 2025
Viaarxiv icon

LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?

Add code
Mar 10, 2025
Viaarxiv icon

Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery

Add code
Feb 09, 2025
Figure 1 for Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery
Figure 2 for Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery
Figure 3 for Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery
Figure 4 for Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery
Viaarxiv icon

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray

Add code
Feb 07, 2025
Viaarxiv icon

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

Add code
Jan 27, 2025
Viaarxiv icon

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

Add code
Jan 09, 2025
Viaarxiv icon

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Add code
Jan 03, 2025
Viaarxiv icon

Probability-density-aware Semi-supervised Learning

Add code
Dec 23, 2024
Viaarxiv icon

Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration

Add code
Dec 12, 2024
Figure 1 for Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration
Figure 2 for Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration
Figure 3 for Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration
Figure 4 for Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration
Viaarxiv icon

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Add code
Dec 05, 2024
Figure 1 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 2 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 3 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 4 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Viaarxiv icon