Picture for Kartik Audhkhasi

Kartik Audhkhasi

STAB: Speech Tokenizer Assessment Benchmark

Add code
Sep 04, 2024
Viaarxiv icon

O-1: Self-training with Oracle and 1-best Hypothesis

Add code
Aug 14, 2023
Viaarxiv icon

Large-scale Language Model Rescoring on Long-form Data

Add code
Jun 13, 2023
Figure 1 for Large-scale Language Model Rescoring on Long-form Data
Figure 2 for Large-scale Language Model Rescoring on Long-form Data
Figure 3 for Large-scale Language Model Rescoring on Long-form Data
Figure 4 for Large-scale Language Model Rescoring on Long-form Data
Viaarxiv icon

Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss

Add code
Mar 10, 2023
Viaarxiv icon

Modular Hybrid Autoregressive Transducer

Add code
Oct 31, 2022
Viaarxiv icon

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition

Add code
Sep 13, 2022
Figure 1 for Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Figure 2 for Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Figure 3 for Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Figure 4 for Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Viaarxiv icon

Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems

Add code
Oct 08, 2020
Figure 1 for Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems
Figure 2 for Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems
Figure 3 for Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems
Figure 4 for Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems
Viaarxiv icon

End-to-End Spoken Language Understanding Without Full Transcripts

Add code
Sep 30, 2020
Figure 1 for End-to-End Spoken Language Understanding Without Full Transcripts
Figure 2 for End-to-End Spoken Language Understanding Without Full Transcripts
Figure 3 for End-to-End Spoken Language Understanding Without Full Transcripts
Figure 4 for End-to-End Spoken Language Understanding Without Full Transcripts
Viaarxiv icon

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

Add code
Jun 16, 2020
Figure 1 for AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Figure 2 for AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Figure 3 for AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Figure 4 for AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Viaarxiv icon

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300

Add code
Jan 20, 2020
Figure 1 for Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300
Figure 2 for Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300
Figure 3 for Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300
Figure 4 for Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300
Viaarxiv icon