Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daben Liu

FB-RAG: Improving RAG with Forward and Backward Lookup

May 22, 2025

Kushal Chawla, Alfy Samuel, Anoop Kumar, Daben Liu

Abstract:The performance of Retrieval Augmented Generation (RAG) systems relies heavily on the retriever quality and the size of the retrieved context. A large enough context ensures that the relevant information is present in the input context for the LLM, but also incorporates irrelevant content that has been shown to confuse the models. On the other hand, a smaller context reduces the irrelevant information, but it often comes at the risk of losing important information necessary to answer the input question. This duality is especially challenging to manage for complex queries that contain little information to retrieve the relevant chunks from the full context. To address this, we present a novel framework, called FB-RAG, which enhances the RAG pipeline by relying on a combination of backward lookup (overlap with the query) and forward lookup (overlap with candidate reasons and answers) to retrieve specific context chunks that are the most relevant for answering the input query. Our evaluations on 9 datasets from two leading benchmarks show that FB-RAG consistently outperforms RAG and Long Context baselines developed recently for these benchmarks. We further show that FB-RAG can improve performance while reducing latency. We perform qualitative analysis of the strengths and shortcomings of our approach, providing specific insights to guide future work.

Via

Access Paper or Ask Questions

Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

Dec 09, 2024

Neel Jain, Aditya Shrivastava, Chenyang Zhu, Daben Liu, Alfy Samuel, Ashwinee Panda, Anoop Kumar, Micah Goldblum, Tom Goldstein

Figure 1 for Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

Figure 2 for Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

Figure 3 for Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

Figure 4 for Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

Abstract:A key component of building safe and reliable language models is enabling the models to appropriately refuse to follow certain instructions or answer certain questions. We may want models to output refusal messages for various categories of user queries, for example, ill-posed questions, instructions for committing illegal acts, or queries which require information past the model's knowledge horizon. Engineering models that refuse to answer such questions is complicated by the fact that an individual may want their model to exhibit varying levels of sensitivity for refusing queries of various categories, and different users may want different refusal rates. The current default approach involves training multiple models with varying proportions of refusal messages from each category to achieve the desired refusal rates, which is computationally expensive and may require training a new model to accommodate each user's desired preference over refusal rates. To address these challenges, we propose refusal tokens, one such token for each refusal category or a single refusal token, which are prepended to the model's responses during training. We then show how to increase or decrease the probability of generating the refusal token for each category during inference to steer the model's refusal behavior. Refusal tokens enable controlling a single model's refusal rates without the need of any further fine-tuning, but only by selectively intervening during generation.

* 19 pages

Via

Access Paper or Ask Questions

Retrieval Augmented Correction of Named Entity Speech Recognition Errors

Sep 09, 2024

Ernest Pusateri, Anmol Walia, Anirudh Kashi, Bortik Bandyopadhyay, Nadia Hyder, Sayantan Mahinder, Raviteja Anantha, Daben Liu, Sashank Gondala

Figure 1 for Retrieval Augmented Correction of Named Entity Speech Recognition Errors

Figure 2 for Retrieval Augmented Correction of Named Entity Speech Recognition Errors

Figure 3 for Retrieval Augmented Correction of Named Entity Speech Recognition Errors

Figure 4 for Retrieval Augmented Correction of Named Entity Speech Recognition Errors

Abstract:In recent years, end-to-end automatic speech recognition (ASR) systems have proven themselves remarkably accurate and performant, but these systems still have a significant error rate for entity names which appear infrequently in their training data. In parallel to the rise of end-to-end ASR systems, large language models (LLMs) have proven to be a versatile tool for various natural language processing (NLP) tasks. In NLP tasks where a database of relevant knowledge is available, retrieval augmented generation (RAG) has achieved impressive results when used with LLMs. In this work, we propose a RAG-like technique for correcting speech recognition entity name errors. Our approach uses a vector database to index a set of relevant entities. At runtime, database queries are generated from possibly errorful textual ASR hypotheses, and the entities retrieved using these queries are fed, along with the ASR hypotheses, to an LLM which has been adapted to correct ASR errors. Overall, our best system achieves 33%-39% relative word error rate reductions on synthetic test sets focused on voice assistant queries of rare music entities without regressing on the STOP test set, a publicly available voice assistant test set covering many domains.

* Submitted to ICASSP 2025

Via

Access Paper or Ask Questions

Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching

Aug 27, 2021

Zhen Huang, Xiaodan Zhuang, Daben Liu, Xiaoqiang Xiao, Yuchen Zhang, Sabato Marco Siniscalchi

Figure 1 for Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching

Figure 2 for Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching

Figure 3 for Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching

Abstract:In this paper, we present our initial efforts for building a code-switching (CS) speech recognition system leveraging existing acoustic models (AMs) and language models (LMs), i.e., no training required, and specifically targeting intra-sentential switching. To achieve such an ambitious goal, new mechanisms for foreign pronunciation generation and language model (LM) enrichment have been devised. Specifically, we have designed an automatic approach to obtain high quality pronunciation of foreign language (FL) words in the native language (NL) phoneme set using existing acoustic phone decoders and an LSTM-based grapheme-to-phoneme (G2P) model. Improved accented pronunciations have thus been obtained by learning foreign pronunciations directly from data. Furthermore, a code-switching LM was deployed by converting the original NL LM into a CS LM using translated word pairs and borrowing statistics for the NL LM. Experimental evidence clearly demonstrates that our approach better deals with accented foreign pronunciations than techniques based on human labeling. Moreover, our best system achieves a 55.5% relative word error rate reduction from 34.4%, obtained with a conventional monolingual ASR system, to 15.3% on an intra-sentential CS task without harming the monolingual recognition accuracy.

* ICASSP2019 12-17 May 2019

Via

Access Paper or Ask Questions

Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems

Dec 07, 2020

Xinwei Li, Yuanyuan Zhang, Xiaodan Zhuang, Daben Liu

Figure 1 for Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems

Figure 2 for Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems

Figure 3 for Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems

Figure 4 for Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems

Abstract:Inspired by SpecAugment -- a data augmentation method for end-to-end ASR systems, we propose a frame-level SpecAugment method (f-SpecAugment) to improve the performance of deep convolutional neural networks (CNN) for hybrid HMM based ASR systems. Similar to the utterance level SpecAugment, f-SpecAugment performs three transformations: time warping, frequency masking, and time masking. Instead of applying the transformations at the utterance level, f-SpecAugment applies them to each convolution window independently during training. We demonstrate that f-SpecAugment is more effective than the utterance level SpecAugment for deep CNN based hybrid models. We evaluate the proposed f-SpecAugment on 50-layer Self-Normalizing Deep CNN (SNDCNN) acoustic models trained with up to 25000 hours of training data. We observe f-SpecAugment reduces WER by 0.5-4.5% relatively across different ASR tasks for four languages. As the benefits of augmentation techniques tend to diminish as training data size increases, the large scale training reported is important in understanding the effectiveness of f-SpecAugment. Our experiments demonstrate that even with 25k training data, f-SpecAugment is still effective. We also demonstrate that f-SpecAugment has benefits approximately equivalent to doubling the amount of training data for deep CNNs.

* To appear in SLT 2021

Via

Access Paper or Ask Questions

SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition

Oct 09, 2019

Zhen Huang, Tim Ng, Leo Liu, Henry Mason, Xiaodan Zhuang, Daben Liu

Figure 1 for SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition

Figure 2 for SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition

Figure 3 for SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition

Figure 4 for SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition

Abstract:Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet-50, we can achieve the same or lower word error rate (WER) while at the same time improving both training and inference speed by 60%-80%. We also explore other model inference optimizations to further reduce latency for production use.

Via

Access Paper or Ask Questions