Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Masayasu Muraoka

Robust ASR Error Correction with Conservative Data Filtering

Jul 18, 2024

Takuma Udagawa, Masayuki Suzuki, Masayasu Muraoka, Gakuto Kurata

Figure 1 for Robust ASR Error Correction with Conservative Data Filtering

Figure 2 for Robust ASR Error Correction with Conservative Data Filtering

Figure 3 for Robust ASR Error Correction with Conservative Data Filtering

Figure 4 for Robust ASR Error Correction with Conservative Data Filtering

Abstract:Error correction (EC) based on large language models is an emerging technology to enhance the performance of automatic speech recognition (ASR) systems. Generally, training data for EC are collected by automatically pairing a large set of ASR hypotheses (as sources) and their gold references (as targets). However, the quality of such pairs is not guaranteed, and we observed various types of noise which can make the EC models brittle, e.g. inducing overcorrection in out-of-domain (OOD) settings. In this work, we propose two fundamental criteria that EC training data should satisfy: namely, EC targets should (1) improve linguistic acceptability over sources and (2) be inferable from the available context (e.g. source phonemes). Through these criteria, we identify low-quality EC pairs and train the models not to make any correction in such cases, the process we refer to as conservative data filtering. In our experiments, we focus on Japanese ASR using a strong Conformer-CTC as the baseline and finetune Japanese LLMs for EC. Through our evaluation on a suite of 21 internal benchmarks, we demonstrate that our approach can significantly reduce overcorrection and improve both the accuracy and quality of ASR results in the challenging OOD settings.

Via

Access Paper or Ask Questions

INDUS: Effective and Efficient Language Models for Scientific Applications

May 17, 2024

Bishwaranjan Bhattacharjee, Aashka Trivedi, Masayasu Muraoka, Muthukumaran Ramasubramanian, Takuma Udagawa, Iksha Gurung, Rong Zhang, Bharath Dandala, Rahul Ramachandran, Manil Maskey(+24 more)

Figure 1 for INDUS: Effective and Efficient Language Models for Scientific Applications

Figure 2 for INDUS: Effective and Efficient Language Models for Scientific Applications

Figure 3 for INDUS: Effective and Efficient Language Models for Scientific Applications

Figure 4 for INDUS: Effective and Efficient Language Models for Scientific Applications

Abstract:Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics, planetary sciences and astrophysics domains and trained using curated scientific corpora drawn from diverse data sources. The suite of models include: (1) an encoder model trained using domain-specific vocabulary and corpora to address natural language understanding tasks, (2) a contrastive-learning-based general text embedding model trained using a diverse set of datasets drawn from multiple sources to address information retrieval tasks and (3) smaller versions of these models created using knowledge distillation techniques to address applications which have latency or resource constraints. We also created three new scientific benchmark datasets namely, CLIMATE-CHANGE-NER (entity-recognition), NASA-QA (extractive QA) and NASA-IR (IR) to accelerate research in these multi-disciplinary fields. Finally, we show that our models outperform both general-purpose encoders (RoBERTa) and existing domain-specific encoders (SciBERT) on these new tasks as well as existing benchmark tasks in the domains of interest.

Via

Access Paper or Ask Questions

Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Sep 07, 2023

Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Masayasu Muraoka, George Saon

Figure 1 for Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Figure 2 for Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Figure 3 for Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Figure 4 for Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Abstract:Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the representation of a text is inherently non-unique and can be obtained variously from different layers, contexts and models. In this work, we explore a wide range of techniques to obtain and transfer multiple representations of LLMs into a transducer-based ASR system. While being conceptually simple, we show that transferring multiple representations of LLMs can be an effective alternative to transferring only a single representation.

* Submitted to ICASSP 2024

Via

Access Paper or Ask Questions