Picture for Junkun Chen

Junkun Chen

Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages

Add code
Nov 11, 2024
Viaarxiv icon

Proto-OOD: Enhancing OOD Object Detection with Prototype Feature Similarity

Add code
Sep 09, 2024
Viaarxiv icon

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

Add code
Sep 06, 2024
Viaarxiv icon

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

Add code
Jun 12, 2024
Figure 1 for Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
Figure 2 for Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
Figure 3 for Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
Viaarxiv icon

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Add code
Oct 23, 2023
Figure 1 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Figure 2 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Figure 3 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Viaarxiv icon

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach

Add code
Oct 06, 2023
Viaarxiv icon

DiariST: Streaming Speech Translation with Speaker Diarization

Add code
Sep 14, 2023
Viaarxiv icon

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Add code
Jul 07, 2023
Figure 1 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 2 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 3 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 4 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Viaarxiv icon

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Add code
Nov 07, 2022
Viaarxiv icon

Is Self-Supervised Learning More Robust Than Supervised Learning?

Add code
Jun 10, 2022
Figure 1 for Is Self-Supervised Learning More Robust Than Supervised Learning?
Figure 2 for Is Self-Supervised Learning More Robust Than Supervised Learning?
Figure 3 for Is Self-Supervised Learning More Robust Than Supervised Learning?
Figure 4 for Is Self-Supervised Learning More Robust Than Supervised Learning?
Viaarxiv icon