Picture for Changqiao Wu

Changqiao Wu

Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities

Add code
Oct 16, 2024
Viaarxiv icon

Mini-Omni2: Towards Open-source GPT-4o Model with Vision, Speech and Duplex

Add code
Oct 15, 2024
Viaarxiv icon

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Add code
Aug 30, 2024
Viaarxiv icon

TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval

Add code
Oct 03, 2022
Figure 1 for TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval
Figure 2 for TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval
Figure 3 for TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval
Figure 4 for TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval
Viaarxiv icon