Picture for Xuming Hu

Xuming Hu

Token Pruning for Caching Better: 9 Times Acceleration on Stable Diffusion for Free

Add code
Dec 31, 2024
Viaarxiv icon

Accelerating Diffusion Transformers with Dual Feature Caching

Add code
Dec 25, 2024
Viaarxiv icon

MAGIC++: Efficient and Resilient Modality-Agnostic Semantic Segmentation via Hierarchical Modality Selection

Add code
Dec 22, 2024
Viaarxiv icon

Review-Then-Refine: A Dynamic Framework for Multi-Hop Question Answering with Temporal Adaptability

Add code
Dec 19, 2024
Viaarxiv icon

UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models

Add code
Dec 16, 2024
Viaarxiv icon

A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

Add code
Dec 16, 2024
Viaarxiv icon

Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey

Add code
Dec 03, 2024
Figure 1 for Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Figure 2 for Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Figure 3 for Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Figure 4 for Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Viaarxiv icon

Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation

Add code
Nov 26, 2024
Viaarxiv icon

SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context

Add code
Nov 25, 2024
Viaarxiv icon

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models

Add code
Nov 22, 2024
Viaarxiv icon