Picture for Kai Yu

Kai Yu

Sherman

Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Add code
Jan 15, 2026
Viaarxiv icon

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Add code
Jan 14, 2026
Viaarxiv icon

What Does the Speaker Embedding Encode?

Add code
Dec 20, 2025
Figure 1 for What Does the Speaker Embedding Encode?
Figure 2 for What Does the Speaker Embedding Encode?
Figure 3 for What Does the Speaker Embedding Encode?
Figure 4 for What Does the Speaker Embedding Encode?
Viaarxiv icon

MergeDNA: Context-aware Genome Modeling with Dynamic Tokenization through Token Merging

Add code
Nov 17, 2025
Figure 1 for MergeDNA: Context-aware Genome Modeling with Dynamic Tokenization through Token Merging
Figure 2 for MergeDNA: Context-aware Genome Modeling with Dynamic Tokenization through Token Merging
Figure 3 for MergeDNA: Context-aware Genome Modeling with Dynamic Tokenization through Token Merging
Figure 4 for MergeDNA: Context-aware Genome Modeling with Dynamic Tokenization through Token Merging
Viaarxiv icon

BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction

Add code
Nov 08, 2025
Figure 1 for BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
Figure 2 for BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
Figure 3 for BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
Figure 4 for BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
Viaarxiv icon

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models

Add code
Oct 26, 2025
Figure 1 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 2 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 3 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 4 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Viaarxiv icon

MS-BART: Unified Modeling of Mass Spectra and Molecules for Structure Elucidation

Add code
Oct 23, 2025
Viaarxiv icon

DiSRouter: Distributed Self-Routing for LLM Selections

Add code
Oct 22, 2025
Viaarxiv icon

Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception

Add code
Oct 14, 2025
Viaarxiv icon

Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video

Add code
Sep 10, 2025
Viaarxiv icon