Picture for Dacheng Yin

Dacheng Yin

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Add code
Oct 15, 2024
Viaarxiv icon

ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

Add code
Nov 30, 2023
Viaarxiv icon

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

Add code
Nov 30, 2023
Viaarxiv icon

Learning Trajectories are Generalization Indicators

Add code
May 04, 2023
Viaarxiv icon

Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss

Add code
Apr 12, 2023
Viaarxiv icon

TridentSE: Guiding Speech Enhancement with 32 Global Tokens

Add code
Oct 24, 2022
Viaarxiv icon

RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion

Add code
Jun 28, 2022
Figure 1 for RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Figure 2 for RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Figure 3 for RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Figure 4 for RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Viaarxiv icon

Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph

Add code
Feb 24, 2022
Figure 1 for Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Figure 2 for Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Figure 3 for Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Figure 4 for Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Viaarxiv icon

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Add code
Sep 12, 2021
Figure 1 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Figure 2 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Figure 3 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Figure 4 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Viaarxiv icon

General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework

Add code
Feb 03, 2021
Figure 1 for General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Figure 2 for General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Figure 3 for General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Figure 4 for General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Viaarxiv icon