Picture for Shanghang Zhang

Shanghang Zhang

Training-free Regional Prompting for Diffusion Transformers

Add code
Nov 04, 2024
Viaarxiv icon

Subgraph Aggregation for Out-of-Distribution Generalization on Graphs

Add code
Oct 29, 2024
Viaarxiv icon

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective

Add code
Oct 29, 2024
Viaarxiv icon

EVA: An Embodied World Model for Future Video Anticipation

Add code
Oct 20, 2024
Viaarxiv icon

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

Add code
Oct 06, 2024
Figure 1 for SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
Figure 2 for SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
Figure 3 for SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
Figure 4 for SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
Viaarxiv icon

Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation

Add code
Sep 24, 2024
Viaarxiv icon

FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models

Add code
Aug 15, 2024
Figure 1 for FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Figure 2 for FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Figure 3 for FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Figure 4 for FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Viaarxiv icon

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

Add code
Jul 30, 2024
Viaarxiv icon

Multimodal Large Language Models for Bioimage Analysis

Add code
Jul 29, 2024
Viaarxiv icon

MAVIS: Mathematical Visual Instruction Tuning

Add code
Jul 11, 2024
Viaarxiv icon