Picture for Kai Shen

Kai Shen

MoonCast: High-Quality Zero-Shot Podcast Generation

Add code
Mar 19, 2025
Viaarxiv icon

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

Add code
Mar 06, 2025
Viaarxiv icon

BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving

Add code
Feb 05, 2025
Viaarxiv icon

FullStack Bench: Evaluating LLMs as Full Stack Coders

Add code
Dec 03, 2024
Figure 1 for FullStack Bench: Evaluating LLMs as Full Stack Coders
Figure 2 for FullStack Bench: Evaluating LLMs as Full Stack Coders
Figure 3 for FullStack Bench: Evaluating LLMs as Full Stack Coders
Figure 4 for FullStack Bench: Evaluating LLMs as Full Stack Coders
Viaarxiv icon

Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference

Add code
Jul 06, 2024
Figure 1 for Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference
Figure 2 for Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference
Figure 3 for Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference
Figure 4 for Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference
Viaarxiv icon

T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text

Add code
Jun 11, 2024
Viaarxiv icon

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Add code
Apr 06, 2024
Viaarxiv icon

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Add code
Mar 05, 2024
Viaarxiv icon

PromptTTS 2: Describing and Generating Voices with Text Prompt

Add code
Sep 05, 2023
Figure 1 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 2 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 3 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 4 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Viaarxiv icon

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

Add code
May 04, 2023
Viaarxiv icon