Picture for Yuxuan Wang

Yuxuan Wang

Sherman

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Add code
Jan 10, 2025
Viaarxiv icon

LongViTU: Instruction Tuning for Long-Form Video Understanding

Add code
Jan 09, 2025
Viaarxiv icon

Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding

Add code
Dec 23, 2024
Viaarxiv icon

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models

Add code
Dec 13, 2024
Figure 1 for CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
Figure 2 for CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
Figure 3 for CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
Figure 4 for CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
Viaarxiv icon

Pushing Rendering Boundaries: Hard Gaussian Splatting

Add code
Dec 06, 2024
Viaarxiv icon

VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format

Add code
Nov 27, 2024
Viaarxiv icon

Edge-Assisted Accelerated Cooperative Sensing for CAVs: Task Placement and Resource Allocation

Add code
Nov 27, 2024
Viaarxiv icon

SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation

Add code
Nov 27, 2024
Viaarxiv icon

Vehicles, Pedestrians, and E-bikes: a Three-party Game at Right-turn-on-red Crossroads Revealing the Dual and Irrational Role of E-bikes that Risks Traffic Safety

Add code
Nov 04, 2024
Viaarxiv icon

Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis

Add code
Nov 02, 2024
Figure 1 for Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Figure 2 for Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Figure 3 for Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Figure 4 for Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Viaarxiv icon