Picture for Dacheng Yin

Dacheng Yin

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Add code
Mar 13, 2025
Viaarxiv icon

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Add code
Oct 15, 2024
Viaarxiv icon

ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

Add code
Nov 30, 2023
Viaarxiv icon

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

Add code
Nov 30, 2023
Figure 1 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Figure 2 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Figure 3 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Figure 4 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Viaarxiv icon

Learning Trajectories are Generalization Indicators

Add code
May 04, 2023
Figure 1 for Learning Trajectories are Generalization Indicators
Figure 2 for Learning Trajectories are Generalization Indicators
Figure 3 for Learning Trajectories are Generalization Indicators
Figure 4 for Learning Trajectories are Generalization Indicators
Viaarxiv icon

Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss

Add code
Apr 12, 2023
Figure 1 for Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss
Figure 2 for Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss
Figure 3 for Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss
Figure 4 for Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss
Viaarxiv icon

TridentSE: Guiding Speech Enhancement with 32 Global Tokens

Add code
Oct 24, 2022
Viaarxiv icon

RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion

Add code
Jun 28, 2022
Figure 1 for RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Figure 2 for RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Figure 3 for RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Figure 4 for RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Viaarxiv icon

Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph

Add code
Feb 24, 2022
Figure 1 for Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Figure 2 for Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Figure 3 for Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Figure 4 for Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Viaarxiv icon

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Add code
Sep 12, 2021
Figure 1 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Figure 2 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Figure 3 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Figure 4 for Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Viaarxiv icon