Picture for Aswin Shanmugam Subramanian

Aswin Shanmugam Subramanian

Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios

Add code
Jun 17, 2025
Viaarxiv icon

PHRASED: Phrase Dictionary Biasing for Speech Translation

Add code
Jun 10, 2025
Viaarxiv icon

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation

Add code
Feb 04, 2025
Figure 1 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 2 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 3 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 4 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Viaarxiv icon

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

Add code
Jun 12, 2024
Figure 1 for Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
Figure 2 for Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
Figure 3 for Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
Viaarxiv icon

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

Add code
Mar 08, 2023
Viaarxiv icon

Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks

Add code
Dec 14, 2022
Viaarxiv icon

Reverberation as Supervision for Speech Separation

Add code
Nov 15, 2022
Viaarxiv icon

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

Add code
Oct 09, 2021
Figure 1 for An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
Figure 2 for An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
Figure 3 for An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
Figure 4 for An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
Viaarxiv icon

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

Add code
Feb 16, 2021
Figure 1 for Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition
Figure 2 for Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition
Figure 3 for Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition
Figure 4 for Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition
Viaarxiv icon

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans

Add code
Dec 23, 2020
Figure 1 for The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Figure 2 for The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Viaarxiv icon