Picture for Haogeng Liu

Haogeng Liu

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

Add code
May 28, 2024
Viaarxiv icon

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Add code
Mar 03, 2024
Viaarxiv icon

Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling

Add code
Oct 11, 2023
Viaarxiv icon

Video-CSR: Complex Video Digest Creation for Visual-Language Models

Add code
Oct 08, 2023
Figure 1 for Video-CSR: Complex Video Digest Creation for Visual-Language Models
Figure 2 for Video-CSR: Complex Video Digest Creation for Visual-Language Models
Figure 3 for Video-CSR: Complex Video Digest Creation for Visual-Language Models
Figure 4 for Video-CSR: Complex Video Digest Creation for Visual-Language Models
Viaarxiv icon

Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion

Add code
Jun 12, 2023
Viaarxiv icon

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion

Add code
Jan 10, 2023
Viaarxiv icon