Picture for Di Zhang

Di Zhang

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Add code
Apr 14, 2025
Viaarxiv icon

InstructEngine: Instruction-driven Text-to-Image Alignment

Add code
Apr 14, 2025
Viaarxiv icon

Leanabell-Prover: Posttraining Scaling in Formal Reasoning

Add code
Apr 09, 2025
Viaarxiv icon

Integrated Sensing and Communications Over the Years: An Evolution Perspective

Add code
Apr 09, 2025
Viaarxiv icon

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Add code
Apr 09, 2025
Viaarxiv icon

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

Add code
Mar 31, 2025
Viaarxiv icon

HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment

Add code
Mar 31, 2025
Viaarxiv icon

SketchVideo: Sketch-based Video Generation and Editing

Add code
Mar 30, 2025
Viaarxiv icon

SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain

Add code
Mar 26, 2025
Viaarxiv icon

FullDiT: Multi-Task Video Generative Foundation Model with Full Attention

Add code
Mar 25, 2025
Viaarxiv icon