Picture for Dan Su

Dan Su

Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset

Add code
Dec 03, 2024
Viaarxiv icon

DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis

Add code
Oct 17, 2024
Viaarxiv icon

Nemotron-4 340B Technical Report

Add code
Jun 17, 2024
Figure 1 for Nemotron-4 340B Technical Report
Figure 2 for Nemotron-4 340B Technical Report
Figure 3 for Nemotron-4 340B Technical Report
Figure 4 for Nemotron-4 340B Technical Report
Viaarxiv icon

Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer

Add code
Jun 03, 2024
Figure 1 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Figure 2 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Figure 3 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Figure 4 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Viaarxiv icon

Fuse after Align: Improving Face-Voice Association Learning via Multimodal Encoder

Add code
Apr 15, 2024
Viaarxiv icon

Nemotron-4 15B Technical Report

Add code
Feb 27, 2024
Viaarxiv icon

MM-LLMs: Recent Advances in MultiModal Large Language Models

Add code
Jan 25, 2024
Figure 1 for MM-LLMs: Recent Advances in MultiModal Large Language Models
Figure 2 for MM-LLMs: Recent Advances in MultiModal Large Language Models
Figure 3 for MM-LLMs: Recent Advances in MultiModal Large Language Models
Figure 4 for MM-LLMs: Recent Advances in MultiModal Large Language Models
Viaarxiv icon

A High Fidelity and Low Complexity Neural Audio Coding

Add code
Oct 17, 2023
Viaarxiv icon

DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis

Add code
Sep 22, 2023
Viaarxiv icon

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

Add code
Sep 04, 2023
Figure 1 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Figure 2 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Figure 3 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Figure 4 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Viaarxiv icon