Picture for Di Zhang

Di Zhang

SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain

Add code
Mar 26, 2025
Viaarxiv icon

FullDiT: Multi-Task Video Generative Foundation Model with Full Attention

Add code
Mar 25, 2025
Viaarxiv icon

Boosting Resolution Generalization of Diffusion Transformers with Randomized Positional Encodings

Add code
Mar 24, 2025
Viaarxiv icon

Position: Interactive Generative Video as Next-Generation Game Engine

Add code
Mar 21, 2025
Viaarxiv icon

ST-Prompt Guided Histological Hypergraph Learning for Spatial Gene Expression Prediction

Add code
Mar 21, 2025
Viaarxiv icon

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

Add code
Mar 18, 2025
Viaarxiv icon

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Add code
Mar 14, 2025
Viaarxiv icon

TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs

Add code
Mar 13, 2025
Viaarxiv icon

Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding

Add code
Mar 12, 2025
Viaarxiv icon

ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis

Add code
Mar 09, 2025
Viaarxiv icon