Picture for Yu-An Chung

Yu-An Chung

DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models

Add code
Oct 31, 2024
Viaarxiv icon

Seamless: Multilingual Expressive and Streaming Speech Translation

Add code
Dec 08, 2023
Figure 1 for Seamless: Multilingual Expressive and Streaming Speech Translation
Figure 2 for Seamless: Multilingual Expressive and Streaming Speech Translation
Figure 3 for Seamless: Multilingual Expressive and Streaming Speech Translation
Figure 4 for Seamless: Multilingual Expressive and Streaming Speech Translation
Viaarxiv icon

CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders

Add code
Sep 14, 2023
Viaarxiv icon

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

Add code
Aug 23, 2023
Figure 1 for SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
Figure 2 for SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
Figure 3 for SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
Figure 4 for SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
Viaarxiv icon

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

Add code
Dec 15, 2022
Viaarxiv icon

Speech-to-Speech Translation For A Real-world Unwritten Language

Add code
Nov 11, 2022
Viaarxiv icon

SSAST: Self-Supervised Audio Spectrogram Transformer

Add code
Oct 19, 2021
Figure 1 for SSAST: Self-Supervised Audio Spectrogram Transformer
Figure 2 for SSAST: Self-Supervised Audio Spectrogram Transformer
Figure 3 for SSAST: Self-Supervised Audio Spectrogram Transformer
Figure 4 for SSAST: Self-Supervised Audio Spectrogram Transformer
Viaarxiv icon

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

Add code
Aug 07, 2021
Figure 1 for W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Figure 2 for W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Figure 3 for W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Figure 4 for W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Viaarxiv icon

AST: Audio Spectrogram Transformer

Add code
Apr 06, 2021
Figure 1 for AST: Audio Spectrogram Transformer
Figure 2 for AST: Audio Spectrogram Transformer
Figure 3 for AST: Audio Spectrogram Transformer
Figure 4 for AST: Audio Spectrogram Transformer
Viaarxiv icon

PSLA: Improving Audio Event Classification with Pretraining, Sampling, Labeling, and Aggregation

Add code
Feb 02, 2021
Figure 1 for PSLA: Improving Audio Event Classification with Pretraining, Sampling, Labeling, and Aggregation
Figure 2 for PSLA: Improving Audio Event Classification with Pretraining, Sampling, Labeling, and Aggregation
Figure 3 for PSLA: Improving Audio Event Classification with Pretraining, Sampling, Labeling, and Aggregation
Figure 4 for PSLA: Improving Audio Event Classification with Pretraining, Sampling, Labeling, and Aggregation
Viaarxiv icon