Picture for Kai Yu

Kai Yu

Sherman

Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception

Add code
Oct 14, 2025
Viaarxiv icon

Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video

Add code
Sep 10, 2025
Viaarxiv icon

POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models

Add code
Aug 28, 2025
Figure 1 for POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models
Figure 2 for POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models
Figure 3 for POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models
Figure 4 for POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models
Viaarxiv icon

MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR

Add code
Aug 26, 2025
Figure 1 for MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
Figure 2 for MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
Figure 3 for MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
Figure 4 for MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
Viaarxiv icon

Joint decoding method for controllable contextual speech recognition based on Speech LLM

Add code
Aug 12, 2025
Figure 1 for Joint decoding method for controllable contextual speech recognition based on Speech LLM
Figure 2 for Joint decoding method for controllable contextual speech recognition based on Speech LLM
Figure 3 for Joint decoding method for controllable contextual speech recognition based on Speech LLM
Figure 4 for Joint decoding method for controllable contextual speech recognition based on Speech LLM
Viaarxiv icon

ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge

Add code
Jul 30, 2025
Viaarxiv icon

Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning

Add code
Jul 23, 2025
Viaarxiv icon

Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning

Add code
Jun 12, 2025
Figure 1 for Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning
Figure 2 for Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning
Figure 3 for Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning
Figure 4 for Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning
Viaarxiv icon

Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning

Add code
Jun 06, 2025
Viaarxiv icon

Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding

Add code
May 30, 2025
Viaarxiv icon