Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yutong Pang

LLaMA based Punctuation Restoration With Forward Pass Only Decoding

Aug 09, 2024

Yutong Pang, Debjyoti Paul, Kevin Jiang, Xuedong Zhang, Xin Lei

Figure 1 for LLaMA based Punctuation Restoration With Forward Pass Only Decoding

Figure 2 for LLaMA based Punctuation Restoration With Forward Pass Only Decoding

Figure 3 for LLaMA based Punctuation Restoration With Forward Pass Only Decoding

Figure 4 for LLaMA based Punctuation Restoration With Forward Pass Only Decoding

Abstract:This paper introduces two advancements in the field of Large Language Model Annotation with a focus on punctuation restoration tasks. Our first contribution is the application of LLaMA for punctuation restoration, which demonstrates superior performance compared to the established benchmark. Despite its impressive quality, LLaMA faces challenges regarding inference speed and hallucinations. To address this, our second contribution presents Forward Pass Only Decoding (FPOD), a novel decoding approach for annotation tasks. This innovative method results in a substantial 19.8x improvement in inference speed, effectively addressing a critical bottleneck and enhancing the practical utility of LLaMA for large-scale data annotation tasks without hallucinations. The combination of these contributions not only solidifies LLaMA as a powerful tool for punctuation restoration but also highlights FPOD as a crucial strategy for overcoming speed constraints.

Via

Access Paper or Ask Questions

Recovering from Privacy-Preserving Masking with Large Language Models

Sep 23, 2023

Arpita Vats, Zhe Liu, Peng Su, Debjyoti Paul, Yingyi Ma, Yutong Pang, Zeeshan Ahmed, Ozlem Kalinli

Figure 1 for Recovering from Privacy-Preserving Masking with Large Language Models

Figure 2 for Recovering from Privacy-Preserving Masking with Large Language Models

Figure 3 for Recovering from Privacy-Preserving Masking with Large Language Models

Figure 4 for Recovering from Privacy-Preserving Masking with Large Language Models

Abstract:Model adaptation is crucial to handle the discrepancy between proxy training data and actual users data received. To effectively perform adaptation, textual data of users is typically stored on servers or their local devices, where downstream natural language processing (NLP) models can be directly trained using such in-domain data. However, this might raise privacy and security concerns due to the extra risks of exposing user information to adversaries. Replacing identifying information in textual data with a generic marker has been recently explored. In this work, we leverage large language models (LLMs) to suggest substitutes of masked tokens and have their effectiveness evaluated on downstream language modeling tasks. Specifically, we propose multiple pre-trained and fine-tuned LLM-based approaches and perform empirical studies on various datasets for the comparison of these methods. Experimental results show that models trained on the obfuscation corpora are able to achieve comparable performance with the ones trained on the original data without privacy-preserving token masking.

* Submitted to ICASSP

Via

Access Paper or Ask Questions

Contextual Biasing of Named-Entities with Large Language Models

Sep 22, 2023

Chuanneng Sun, Zeeshan Ahmed, Yingyi Ma, Zhe Liu, Lucas Kabela, Yutong Pang, Ozlem Kalinli

Figure 1 for Contextual Biasing of Named-Entities with Large Language Models

Figure 2 for Contextual Biasing of Named-Entities with Large Language Models

Figure 3 for Contextual Biasing of Named-Entities with Large Language Models

Figure 4 for Contextual Biasing of Named-Entities with Large Language Models

Abstract:This paper studies contextual biasing with Large Language Models (LLMs), where during second-pass rescoring additional contextual information is provided to a LLM to boost Automatic Speech Recognition (ASR) performance. We propose to leverage prompts for a LLM without fine tuning during rescoring which incorporate a biasing list and few-shot examples to serve as additional information when calculating the score for the hypothesis. In addition to few-shot prompt learning, we propose multi-task training of the LLM to predict both the entity class and the next token. To improve the efficiency for contextual biasing and to avoid exceeding LLMs' maximum sequence lengths, we propose dynamic prompting, where we select the most likely class using the class tag prediction, and only use entities in this class as contexts for next token prediction. Word Error Rate (WER) evaluation is performed on i) an internal calling, messaging, and dictation dataset, and ii) the SLUE-Voxpopuli dataset. Results indicate that biasing lists and few-shot examples can achieve 17.8% and 9.6% relative improvement compared to first pass ASR, and that multi-task training and dynamic prompting can achieve 20.0% and 11.3% relative WER improvement, respectively.

* 5 pages, 4 figures. Conference: ICASSP 2024

Via

Access Paper or Ask Questions

Language Agnostic Data-Driven Inverse Text Normalization

Jan 24, 2023

Szu-Jui Chen, Debjyoti Paul, Yutong Pang, Peng Su, Xuedong Zhang

Figure 1 for Language Agnostic Data-Driven Inverse Text Normalization

Figure 2 for Language Agnostic Data-Driven Inverse Text Normalization

Figure 3 for Language Agnostic Data-Driven Inverse Text Normalization

Figure 4 for Language Agnostic Data-Driven Inverse Text Normalization

Abstract:With the emergence of automatic speech recognition (ASR) models, converting the spoken form text (from ASR) to the written form is in urgent need. This inverse text normalization (ITN) problem attracts the attention of researchers from various fields. Recently, several works show that data-driven ITN methods can output high-quality written form text. Due to the scarcity of labeled spoken-written datasets, the studies on non-English data-driven ITN are quite limited. In this work, we propose a language-agnostic data-driven ITN framework to fill this gap. Specifically, we leverage the data augmentation in conjunction with neural machine translated data for low resource languages. Moreover, we design an evaluation method for language agnostic ITN model when only English data is available. Our empirical evaluation shows this language agnostic modeling approach is effective for low resource languages while preserving the performance for high resource languages.

Via

Access Paper or Ask Questions

Improving Data Driven Inverse Text Normalization using Data Augmentation

Jul 20, 2022

Laxmi Pandey, Debjyoti Paul, Pooja Chitkara, Yutong Pang, Xuedong Zhang, Kjell Schubert, Mark Chou, Shu Liu, Yatharth Saraf

Figure 1 for Improving Data Driven Inverse Text Normalization using Data Augmentation

Figure 2 for Improving Data Driven Inverse Text Normalization using Data Augmentation

Figure 3 for Improving Data Driven Inverse Text Normalization using Data Augmentation

Figure 4 for Improving Data Driven Inverse Text Normalization using Data Augmentation

Abstract:Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural modeling approaches require quality large-scale spoken-written pair examples in the same or similar domain as the ASR system (in-domain data), to train. Both these approaches require costly and complex annotations. In this paper, we present a data augmentation technique that effectively generates rich spoken-written numeric pairs from out-of-domain textual data with minimal human annotation. We empirically demonstrate that ITN model trained using our data augmentation technique consistently outperform ITN model trained using only in-domain data across all numeric surfaces like cardinal, currency, and fraction, by an overall accuracy of 14.44%.

Via

Access Paper or Ask Questions

Improving N-gram Language Models with Pre-trained Deep Transformer

Nov 22, 2019

Yiren Wang, Hongzhao Huang, Zhe Liu, Yutong Pang, Yongqiang Wang, ChengXiang Zhai, Fuchun Peng

Figure 1 for Improving N-gram Language Models with Pre-trained Deep Transformer

Figure 2 for Improving N-gram Language Models with Pre-trained Deep Transformer

Figure 3 for Improving N-gram Language Models with Pre-trained Deep Transformer

Figure 4 for Improving N-gram Language Models with Pre-trained Deep Transformer

Abstract:Although n-gram language models (LMs) have been outperformed by the state-of-the-art neural LMs, they are still widely used in speech recognition due to its high efficiency in inference. In this paper, we demonstrate that n-gram LM can be improved by neural LMs through a text generation based data augmentation method. In contrast to previous approaches, we employ a large-scale general domain pre-training followed by in-domain fine-tuning strategy to construct deep Transformer based neural LMs. Large amount of in-domain text data is generated with the well trained deep Transformer to construct new n-gram LMs, which are then interpolated with baseline n-gram systems. Empirical studies on different speech recognition tasks show that the proposed approach can effectively improve recognition accuracy. In particular, our proposed approach brings significant relative word error rate reduction up to 6.0% for domains with limited in-domain data.

Via

Access Paper or Ask Questions