Picture for Dacheng Yin

Dacheng Yin

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Add code
Oct 15, 2024
Viaarxiv icon

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

Add code
Nov 30, 2023
Figure 1 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Figure 2 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Figure 3 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Figure 4 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Viaarxiv icon

ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

Add code
Nov 30, 2023
Viaarxiv icon

Learning Trajectories are Generalization Indicators

Add code
May 04, 2023
Viaarxiv icon

Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss

Add code
Apr 12, 2023
Viaarxiv icon

TridentSE: Guiding Speech Enhancement with 32 Global Tokens

Add code
Oct 24, 2022
Viaarxiv icon

RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion

Add code
Jun 28, 2022
Figure 1 for RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Figure 2 for RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Figure 3 for RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Figure 4 for RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Viaarxiv icon

Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph

Add code
Feb 24, 2022
Figure 1 for Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Figure 2 for Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Figure 3 for Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Figure 4 for Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Viaarxiv icon

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Add code
Sep 12, 2021
Figure 1 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Figure 2 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Figure 3 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Figure 4 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Viaarxiv icon

General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework

Add code
Feb 03, 2021
Figure 1 for General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Figure 2 for General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Figure 3 for General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Figure 4 for General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Viaarxiv icon