Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shashi Bhushan TN

Query-OPT: Optimizing Inference of Large Language Models via Multi-Query Instructions in Meeting Summarization

Feb 29, 2024

Md Tahmid Rahman Laskar, Elena Khasanova, Xue-Yong Fu, Cheng Chen, Shashi Bhushan TN

Figure 1 for Query-OPT: Optimizing Inference of Large Language Models via Multi-Query Instructions in Meeting Summarization

Figure 2 for Query-OPT: Optimizing Inference of Large Language Models via Multi-Query Instructions in Meeting Summarization

Figure 3 for Query-OPT: Optimizing Inference of Large Language Models via Multi-Query Instructions in Meeting Summarization

Figure 4 for Query-OPT: Optimizing Inference of Large Language Models via Multi-Query Instructions in Meeting Summarization

Abstract:This work focuses on the task of query-based meeting summarization in which the summary of a context (meeting transcript) is generated in response to a specific query. When using Large Language Models (LLMs) for this task, a new call to the LLM inference endpoint/API is required for each new query even if the context stays the same. However, repeated calls to the LLM inference endpoints would significantly increase the costs of using them in production, making LLMs impractical for many real-world use cases. To address this problem, in this paper, we investigate whether combining the queries for the same input context in a single prompt to minimize repeated calls can be successfully used in meeting summarization. In this regard, we conduct extensive experiments by comparing the performance of various popular LLMs: GPT-4, PaLM-2, LLaMA-2, Mistral, and FLAN-T5 in single-query and multi-query settings. We observe that while most LLMs tend to respond to the multi-query instructions, almost all of them (except GPT-4), even after fine-tuning, could not properly generate the response in the required output format. We conclude that while multi-query prompting could be useful to optimize the inference costs by reducing calls to the inference endpoints/APIs for the task of meeting summarization, this capability to reliably generate the response in the expected format is only limited to certain LLMs.

Via

Access Paper or Ask Questions

Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?

Feb 01, 2024

Xue-Yong Fu, Md Tahmid Rahman Laskar, Elena Khasanova, Cheng Chen, Shashi Bhushan TN

Figure 1 for Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?

Figure 2 for Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?

Figure 3 for Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?

Figure 4 for Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?

Abstract:Large Language Models (LLMs) have demonstrated impressive capabilities to solve a wide range of tasks without being explicitly fine-tuned on task-specific datasets. However, deploying LLMs in the real world is not trivial, as it requires substantial computing resources. In this paper, we investigate whether smaller, compact LLMs are a good alternative to the comparatively Larger LLMs2 to address significant costs associated with utilizing LLMs in the real world. In this regard, we study the meeting summarization task in a real-world industrial environment and conduct extensive experiments by comparing the performance of fine-tuned compact LLMs (e.g., FLAN-T5, TinyLLaMA, LiteLLaMA) with zero-shot larger LLMs (e.g., LLaMA-2, GPT-3.5, PaLM-2). We observe that most smaller LLMs, even after fine-tuning, fail to outperform larger zero-shot LLMs in meeting summarization datasets. However, a notable exception is FLAN-T5 (780M parameters), which performs on par or even better than many zero-shot Larger LLMs (from 7B to above 70B parameters), while being significantly smaller. This makes compact LLMs like FLAN-T5 a suitable cost-efficient solution for real-world industrial deployment.

* The first two authors contributed equally to this work

Via

Access Paper or Ask Questions

Building Real-World Meeting Summarization Systems using Large Language Models: A Practical Perspective

Nov 08, 2023

Md Tahmid Rahman Laskar, Xue-Yong Fu, Cheng Chen, Shashi Bhushan TN

Figure 1 for Building Real-World Meeting Summarization Systems using Large Language Models: A Practical Perspective

Figure 2 for Building Real-World Meeting Summarization Systems using Large Language Models: A Practical Perspective

Figure 3 for Building Real-World Meeting Summarization Systems using Large Language Models: A Practical Perspective

Figure 4 for Building Real-World Meeting Summarization Systems using Large Language Models: A Practical Perspective

Abstract:This paper studies how to effectively build meeting summarization systems for real-world usage using large language models (LLMs). For this purpose, we conduct an extensive evaluation and comparison of various closed-source and open-source LLMs, namely, GPT-4, GPT- 3.5, PaLM-2, and LLaMA-2. Our findings reveal that most closed-source LLMs are generally better in terms of performance. However, much smaller open-source models like LLaMA- 2 (7B and 13B) could still achieve performance comparable to the large closed-source models even in zero-shot scenarios. Considering the privacy concerns of closed-source models for only being accessible via API, alongside the high cost associated with using fine-tuned versions of the closed-source models, the opensource models that can achieve competitive performance are more advantageous for industrial use. Balancing performance with associated costs and privacy concerns, the LLaMA-2-7B model looks more promising for industrial usage. In sum, this paper offers practical insights on using LLMs for real-world business meeting summarization, shedding light on the trade-offs between performance and cost.

* EMNLP 2023 Industry Track

Via

Access Paper or Ask Questions

Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs

Nov 01, 2023

Xue-Yong Fu, Md Tahmid Rahman Laskar, Cheng Chen, Shashi Bhushan TN

Figure 1 for Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs

Figure 2 for Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs

Figure 3 for Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs

Abstract:In recent years, Large Language Models (LLMs) have gained immense attention due to their notable emergent capabilities, surpassing those seen in earlier language models. A particularly intriguing application of LLMs is their role as evaluators for texts produced by various generative models. In this study, we delve into the potential of LLMs as reliable assessors of factual consistency in summaries generated by text-generation models. Initially, we introduce an innovative approach for factuality assessment using LLMs. This entails employing a singular LLM for the entirety of the question-answering-based factuality scoring process. Following this, we examine the efficacy of various LLMs in direct factuality scoring, benchmarking them against traditional measures and human annotations. Contrary to initial expectations, our results indicate a lack of significant correlations between factuality metrics and human evaluations, specifically for GPT-4 and PaLM-2. Notable correlations were only observed with GPT-3.5 across two factuality subcategories. These consistent findings across various factual error categories suggest a fundamental limitation in the current LLMs' capability to accurately gauge factuality. This version presents the information more concisely while maintaining the main points and findings of the original text.

* accepted by Generation, Evaluation & Metrics (GEM) Workshop at EMNLP 2023

Via

Access Paper or Ask Questions

Improving Named Entity Recognition in Telephone Conversations via Effective Active Learning with Human in the Loop

Nov 02, 2022

Md Tahmid Rahman Laskar, Cheng Chen, Xue-Yong Fu, Shashi Bhushan TN

Figure 1 for Improving Named Entity Recognition in Telephone Conversations via Effective Active Learning with Human in the Loop

Figure 2 for Improving Named Entity Recognition in Telephone Conversations via Effective Active Learning with Human in the Loop

Figure 3 for Improving Named Entity Recognition in Telephone Conversations via Effective Active Learning with Human in the Loop

Figure 4 for Improving Named Entity Recognition in Telephone Conversations via Effective Active Learning with Human in the Loop

Abstract:Telephone transcription data can be very noisy due to speech recognition errors, disfluencies, etc. Not only that annotating such data is very challenging for the annotators, but also such data may have lots of annotation errors even after the annotation job is completed, resulting in a very poor model performance. In this paper, we present an active learning framework that leverages human in the loop learning to identify data samples from the annotated dataset for re-annotation that are more likely to contain annotation errors. In this way, we largely reduce the need for data re-annotation for the whole dataset. We conduct extensive experiments with our proposed approach for Named Entity Recognition and observe that by re-annotating only about 6% training instances out of the whole dataset, the F1 score for a certain entity type can be significantly improved by about 25%.

* The final version of this paper will be published in the Proceedings of the DaSH Workshop @ EMNLP 2022. This paper is accepted for presentation in both DaSH@EMNLP 2022 and HiLL@NIPS 2022

Via

Access Paper or Ask Questions

Entity-level Sentiment Analysis in Contact Center Telephone Conversations

Oct 26, 2022

Xue-Yong Fu, Cheng Chen, Md Tahmid Rahman Laskar, Shayna Gardiner, Pooja Hiranandani, Shashi Bhushan TN

Figure 1 for Entity-level Sentiment Analysis in Contact Center Telephone Conversations

Figure 2 for Entity-level Sentiment Analysis in Contact Center Telephone Conversations

Figure 3 for Entity-level Sentiment Analysis in Contact Center Telephone Conversations

Figure 4 for Entity-level Sentiment Analysis in Contact Center Telephone Conversations

Abstract:Entity-level sentiment analysis predicts the sentiment about entities mentioned in a given text. It is very useful in a business context to understand user emotions towards certain entities, such as products or companies. In this paper, we demonstrate how we developed an entity-level sentiment analysis system that analyzes English telephone conversation transcripts in contact centers to provide business insight. We present two approaches, one entirely based on the transformer-based DistilBERT model, and another that uses a convolutional neural network supplemented with some heuristic rules.

* EMNLP 2022

Via

Access Paper or Ask Questions

An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts

Sep 27, 2022

Xue-Yong Fu, Cheng Chen, Md Tahmid Rahman Laskar, Shashi Bhushan TN, Simon Corston-Oliver

Figure 1 for An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts

Figure 2 for An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts

Figure 3 for An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts

Figure 4 for An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts

Abstract:We present a simple yet effective method to train a named entity recognition (NER) model that operates on business telephone conversation transcripts that contain noise due to the nature of spoken conversation and artifacts of automatic speech recognition. We first fine-tune LUKE, a state-of-the-art Named Entity Recognition (NER) model, on a limited amount of transcripts, then use it as the teacher model to teach a smaller DistilBERT-based student model using a large amount of weakly labeled data and a small amount of human-annotated data. The model achieves high accuracy while also satisfying the practical constraints for inclusion in a commercial telephony product: realtime performance when deployed on cost-effective CPUs rather than GPUs.

Via

Access Paper or Ask Questions

BLINK with Elasticsearch for Efficient Entity Linking in Business Conversations

May 09, 2022

Md Tahmid Rahman Laskar, Cheng Chen, Aliaksandr Martsinovich, Jonathan Johnston, Xue-Yong Fu, Shashi Bhushan TN, Simon Corston-Oliver

Figure 1 for BLINK with Elasticsearch for Efficient Entity Linking in Business Conversations

Figure 2 for BLINK with Elasticsearch for Efficient Entity Linking in Business Conversations

Figure 3 for BLINK with Elasticsearch for Efficient Entity Linking in Business Conversations

Figure 4 for BLINK with Elasticsearch for Efficient Entity Linking in Business Conversations

Abstract:An Entity Linking system aligns the textual mentions of entities in a text to their corresponding entries in a knowledge base. However, deploying a neural entity linking system for efficient real-time inference in production environments is a challenging task. In this work, we present a neural entity linking system that connects the product and organization type entities in business conversations to their corresponding Wikipedia and Wikidata entries. The proposed system leverages Elasticsearch to ensure inference efficiency when deployed in a resource limited cloud machine, and obtains significant improvements in terms of inference speed and memory consumption while retaining high accuracy.

* NAACL 2022

Via

Access Paper or Ask Questions

Improving Punctuation Restoration for Speech Transcripts via External Data

Oct 01, 2021

Xue-Yong Fu, Cheng Chen, Md Tahmid Rahman Laskar, Shashi Bhushan TN, Simon Corston-Oliver

Figure 1 for Improving Punctuation Restoration for Speech Transcripts via External Data

Figure 2 for Improving Punctuation Restoration for Speech Transcripts via External Data

Figure 3 for Improving Punctuation Restoration for Speech Transcripts via External Data

Figure 4 for Improving Punctuation Restoration for Speech Transcripts via External Data

Abstract:Automatic Speech Recognition (ASR) systems generally do not produce punctuated transcripts. To make transcripts more readable and follow the expected input format for downstream language models, it is necessary to add punctuation marks. In this paper, we tackle the punctuation restoration problem specifically for the noisy text (e.g., phone conversation scenarios). To leverage the available written text datasets, we introduce a data sampling technique based on an n-gram language model to sample more training data that are similar to our in-domain data. Moreover, we propose a two-stage fine-tuning approach that utilizes the sampled external data as well as our in-domain dataset for models based on BERT. Extensive experiments show that the proposed approach outperforms the baseline with an improvement of 1:12% F1 score.

* Accepted by W-NUT at EMNLP 2021

Via

Access Paper or Ask Questions