Picture for Hao Fei

Hao Fei

VEGAS: Towards Visually Explainable and Grounded Artificial Social Intelligence

Add code
Apr 03, 2025
Viaarxiv icon

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

Add code
Mar 31, 2025
Viaarxiv icon

JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

Add code
Mar 30, 2025
Viaarxiv icon

Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology

Add code
Mar 19, 2025
Viaarxiv icon

Universal Scene Graph Generation

Add code
Mar 19, 2025
Viaarxiv icon

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene

Add code
Mar 19, 2025
Viaarxiv icon

Multi-Granular Multimodal Clue Fusion for Meme Understanding

Add code
Mar 16, 2025
Viaarxiv icon

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Add code
Mar 16, 2025
Viaarxiv icon

TAIL: Text-Audio Incremental Learning

Add code
Mar 06, 2025
Viaarxiv icon

Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models

Add code
Mar 03, 2025
Viaarxiv icon