Picture for Qingkai Fang

Qingkai Fang

Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model

Add code
Jun 16, 2025
Viaarxiv icon

LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Add code
May 05, 2025
Viaarxiv icon

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Add code
Jan 07, 2025
Figure 1 for LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Figure 2 for LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Figure 3 for LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Figure 4 for LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Viaarxiv icon

BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment

Add code
Nov 25, 2024
Figure 1 for BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment
Figure 2 for BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment
Figure 3 for BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment
Figure 4 for BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment
Viaarxiv icon

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Add code
Sep 10, 2024
Figure 1 for LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Figure 2 for LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Figure 3 for LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Figure 4 for LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Viaarxiv icon

CTC-based Non-autoregressive Textless Speech-to-Speech Translation

Add code
Jun 11, 2024
Figure 1 for CTC-based Non-autoregressive Textless Speech-to-Speech Translation
Figure 2 for CTC-based Non-autoregressive Textless Speech-to-Speech Translation
Figure 3 for CTC-based Non-autoregressive Textless Speech-to-Speech Translation
Figure 4 for CTC-based Non-autoregressive Textless Speech-to-Speech Translation
Viaarxiv icon

Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?

Add code
Jun 11, 2024
Viaarxiv icon

A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation

Add code
Jun 11, 2024
Figure 1 for A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation
Figure 2 for A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation
Figure 3 for A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation
Figure 4 for A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation
Viaarxiv icon

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

Add code
Jun 05, 2024
Viaarxiv icon

Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

Add code
Oct 20, 2023
Figure 1 for Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation
Figure 2 for Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation
Figure 3 for Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation
Figure 4 for Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation
Viaarxiv icon