Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Milan Aggarwal

It Helps to Take a Second Opinion: Teaching Smaller LLMs to Deliberate Mutually via Selective Rationale Optimisation

Mar 04, 2025

Sohan Patnaik, Milan Aggarwal, Sumit Bhatia, Balaji Krishnamurthy

Abstract:Very large language models (LLMs) such as GPT-4 have shown the ability to handle complex tasks by generating and self-refining step-by-step rationales. Smaller language models (SLMs), typically with < 13B parameters, have been improved by using the data generated from very-large LMs through knowledge distillation. However, various practical constraints such as API costs, copyright, legal and ethical policies restrict using large (often opaque) models to train smaller models for commercial use. Limited success has been achieved at improving the ability of an SLM to explore the space of possible rationales and evaluate them by itself through self-deliberation. To address this, we propose COALITION, a trainable framework that facilitates interaction between two variants of the same SLM and trains them to generate and refine rationales optimized for the end-task. The variants exhibit different behaviors to produce a set of diverse candidate rationales during the generation and refinement steps. The model is then trained via Selective Rationale Optimization (SRO) to prefer generating rationale candidates that maximize the likelihood of producing the ground-truth answer. During inference, COALITION employs a controller to select the suitable variant for generating and refining the rationales. On five different datasets covering mathematical problems, commonsense reasoning, and natural language inference, COALITION outperforms several baselines by up to 5%. Our ablation studies reveal that cross-communication between the two variants performs better than using the single model to self-refine the rationales. We also demonstrate the applicability of COALITION for LMs of varying scales (4B to 14B parameters) and model families (Mistral, Llama, Qwen, Phi). We release the code for this work at https://github.com/Sohanpatnaik106/coalition.

* Accepted at ICLR 2025

Via

Access Paper or Ask Questions

CABINET: Content Relevance based Noise Reduction for Table Question Answering

Feb 05, 2024

Sohan Patnaik, Heril Changwal, Milan Aggarwal, Sumit Bhatia, Yaman Kumar, Balaji Krishnamurthy

Figure 1 for CABINET: Content Relevance based Noise Reduction for Table Question Answering

Figure 2 for CABINET: Content Relevance based Noise Reduction for Table Question Answering

Figure 3 for CABINET: Content Relevance based Noise Reduction for Table Question Answering

Figure 4 for CABINET: Content Relevance based Noise Reduction for Table Question Answering

Abstract:Table understanding capability of Large Language Models (LLMs) has been extensively studied through the task of question-answering (QA) over tables. Typically, only a small part of the whole table is relevant to derive the answer for a given question. The irrelevant parts act as noise and are distracting information, resulting in sub-optimal performance due to the vulnerability of LLMs to noise. To mitigate this, we propose CABINET (Content RelevAnce-Based NoIse ReductioN for TablE QuesTion-Answering) - a framework to enable LLMs to focus on relevant tabular data by suppressing extraneous information. CABINET comprises an Unsupervised Relevance Scorer (URS), trained differentially with the QA LLM, that weighs the table content based on its relevance to the input question before feeding it to the question-answering LLM (QA LLM). To further aid the relevance scorer, CABINET employs a weakly supervised module that generates a parsing statement describing the criteria of rows and columns relevant to the question and highlights the content of corresponding table cells. CABINET significantly outperforms various tabular LLM baselines, as well as GPT3-based in-context learning methods, is more robust to noise, maintains outperformance on tables of varying sizes, and establishes new SoTA performance on WikiTQ, FeTaQA, and WikiSQL datasets. We release our code and datasets at https://github.com/Sohanpatnaik106/CABINET_QA.

* Accepted at ICLR 2024 (spotlight)

Via

Access Paper or Ask Questions

Dialogue Agents 101: A Beginner's Guide to Critical Ingredients for Designing Effective Conversational Systems

Jul 14, 2023

Shivani Kumar, Sumit Bhatia, Milan Aggarwal, Tanmoy Chakraborty

Figure 1 for Dialogue Agents 101: A Beginner's Guide to Critical Ingredients for Designing Effective Conversational Systems

Figure 2 for Dialogue Agents 101: A Beginner's Guide to Critical Ingredients for Designing Effective Conversational Systems

Figure 3 for Dialogue Agents 101: A Beginner's Guide to Critical Ingredients for Designing Effective Conversational Systems

Figure 4 for Dialogue Agents 101: A Beginner's Guide to Critical Ingredients for Designing Effective Conversational Systems

Abstract:Sharing ideas through communication with peers is the primary mode of human interaction. Consequently, extensive research has been conducted in the area of conversational AI, leading to an increase in the availability and diversity of conversational tasks, datasets, and methods. However, with numerous tasks being explored simultaneously, the current landscape of conversational AI becomes fragmented. Therefore, initiating a well-thought-out model for a dialogue agent can pose significant challenges for a practitioner. Towards highlighting the critical ingredients needed for a practitioner to design a dialogue agent from scratch, the current study provides a comprehensive overview of the primary characteristics of a dialogue agent, the supporting tasks, their corresponding open-domain datasets, and the methods used to benchmark these datasets. We observe that different methods have been used to tackle distinct dialogue tasks. However, building separate models for each task is costly and does not leverage the correlation among the several tasks of a dialogue agent. As a result, recent trends suggest a shift towards building unified foundation models. To this end, we propose UNIT, a UNified dIalogue dataseT constructed from conversations of existing datasets for different dialogue tasks capturing the nuances for each of them. We also examine the evaluation strategies used to measure the performance of dialogue agents and highlight the scope for future research in the area of conversational AI.

* 21 pages, 3 figures, 3 tables

Via

Access Paper or Ask Questions

INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of Large Language Models

May 11, 2023

H S V N S Kowndinya Renduchintala, Krishnateja Killamsetty, Sumit Bhatia, Milan Aggarwal, Ganesh Ramakrishnan, Rishabh Iyer, Balaji Krishnamurthy

Figure 1 for INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of Large Language Models

Figure 2 for INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of Large Language Models

Figure 3 for INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of Large Language Models

Figure 4 for INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of Large Language Models

Abstract:A salient characteristic of large pre-trained language models (PTLMs) is a remarkable improvement in their generalization capability and emergence of new capabilities with increasing model capacity and pre-training dataset size. Consequently, we are witnessing the development of enormous models pushing the state-of-the-art. It is, however, imperative to realize that this inevitably leads to prohibitively long training times, extortionate computing costs, and a detrimental environmental impact. Significant efforts are underway to make PTLM training more efficient through innovations in model architectures, training pipelines, and loss function design, with scant attention being paid to optimizing the utility of training data. The key question that we ask is whether it is possible to train PTLMs by employing only highly informative subsets of the training data while maintaining downstream performance? Building upon the recent progress in informative data subset selection, we show how we can employ submodular optimization to select highly representative subsets of the training corpora. Our results demonstrate that the proposed framework can be applied to efficiently train multiple PTLMs (BERT, BioBERT, GPT-2) using only a fraction of data while retaining up to $\sim99\%$ of the performance of the fully-trained models.

Via

Access Paper or Ask Questions

One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text

Sep 12, 2022

Abhinav Java, Shripad Deshmukh, Milan Aggarwal, Surgan Jandial, Mausoom Sarkar, Balaji Krishnamurthy

Figure 1 for One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text

Figure 2 for One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text

Figure 3 for One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text

Figure 4 for One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text

Abstract:Active consumption of digital documents has yielded scope for research in various applications, including search. Traditionally, searching within a document has been cast as a text matching problem ignoring the rich layout and visual cues commonly present in structured documents, forms, etc. To that end, we ask a mostly unexplored question: "Can we search for other similar snippets present in a target document page given a single query instance of a document snippet?". We propose MONOMER to solve this as a one-shot snippet detection task. MONOMER fuses context from visual, textual, and spatial modalities of snippets and documents to find query snippet in target documents. We conduct extensive ablations and experiments showing MONOMER outperforms several baselines from one-shot object detection (BHRL), template matching, and document understanding (LayoutLMv3). Due to the scarcity of relevant data for the task at hand, we train MONOMER on programmatically generated data having many visually similar query snippets and target document pairs from two datasets - Flamingo Forms and PubLayNet. We also do a human study to validate the generated data.

Via

Access Paper or Ask Questions

Persuasion Strategies in Advertisements: Dataset, Modeling, and Baselines

Aug 20, 2022

Yaman Kumar Singla, Rajat Jha, Arunim Gupta, Milan Aggarwal, Aditya Garg, Ayush Bhardwaj, Tushar, Balaji Krishnamurthy, Rajiv Ratn Shah, Changyou Chen

Figure 1 for Persuasion Strategies in Advertisements: Dataset, Modeling, and Baselines

Figure 2 for Persuasion Strategies in Advertisements: Dataset, Modeling, and Baselines

Figure 3 for Persuasion Strategies in Advertisements: Dataset, Modeling, and Baselines

Figure 4 for Persuasion Strategies in Advertisements: Dataset, Modeling, and Baselines

Abstract:Modeling what makes an advertisement persuasive, i.e., eliciting the desired response from consumer, is critical to the study of propaganda, social psychology, and marketing. Despite its importance, computational modeling of persuasion in computer vision is still in its infancy, primarily due to the lack of benchmark datasets that can provide persuasion-strategy labels associated with ads. Motivated by persuasion literature in social psychology and marketing, we introduce an extensive vocabulary of persuasion strategies and build the first ad image corpus annotated with persuasion strategies. We then formulate the task of persuasion strategy prediction with multi-modal learning, where we design a multi-task attention fusion model that can leverage other ad-understanding tasks to predict persuasion strategies. Further, we conduct a real-world case study on 1600 advertising campaigns of 30 Fortune-500 companies where we use our model's predictions to analyze which strategies work with different demographics (age and gender). The dataset also provides image segmentation masks, which labels persuasion strategies in the corresponding ad images on the test split. We publicly release our code and dataset https://midas-research.github.io/persuasion-advertisements/.

Via

Access Paper or Ask Questions

LM-CORE: Language Models with Contextually Relevant External Knowledge

Aug 12, 2022

Jivat Neet Kaur, Sumit Bhatia, Milan Aggarwal, Rachit Bansal, Balaji Krishnamurthy

Figure 1 for LM-CORE: Language Models with Contextually Relevant External Knowledge

Figure 2 for LM-CORE: Language Models with Contextually Relevant External Knowledge

Figure 3 for LM-CORE: Language Models with Contextually Relevant External Knowledge

Figure 4 for LM-CORE: Language Models with Contextually Relevant External Knowledge

Abstract:Large transformer-based pre-trained language models have achieved impressive performance on a variety of knowledge-intensive tasks and can capture factual knowledge in their parameters. We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements. We posit that a more efficient alternative is to provide explicit access to contextually relevant structured knowledge to the model and train it to use that knowledge. We present LM-CORE -- a general framework to achieve this -- that allows \textit{decoupling} of the language model training from the external knowledge source and allows the latter to be updated without affecting the already trained model. Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks; can effectively handle knowledge updates; and performs well on two downstream tasks. We also present a thorough error analysis highlighting the successes and failures of LM-CORE.

* Published at Findings of NAACL, 2022

Via

Access Paper or Ask Questions

CoSe-Co: Text Conditioned Generative CommonSense Contextualizer

Jun 17, 2022

Rachit Bansal, Milan Aggarwal, Sumit Bhatia, Jivat Neet Kaur, Balaji Krishnamurthy

Figure 1 for CoSe-Co: Text Conditioned Generative CommonSense Contextualizer

Figure 2 for CoSe-Co: Text Conditioned Generative CommonSense Contextualizer

Figure 3 for CoSe-Co: Text Conditioned Generative CommonSense Contextualizer

Figure 4 for CoSe-Co: Text Conditioned Generative CommonSense Contextualizer

Abstract:Pre-trained Language Models (PTLMs) have been shown to perform well on natural language tasks. Many prior works have leveraged structured commonsense present in the form of entities linked through labeled relations in Knowledge Graphs (KGs) to assist PTLMs. Retrieval approaches use KG as a separate static module which limits coverage since KGs contain finite knowledge. Generative methods train PTLMs on KG triples to improve the scale at which knowledge can be obtained. However, training on symbolic KG entities limits their applicability in tasks involving natural language text where they ignore overall context. To mitigate this, we propose a CommonSense Contextualizer (CoSe-Co) conditioned on sentences as input to make it generically usable in tasks for generating knowledge relevant to the overall context of input text. To train CoSe-Co, we propose a novel dataset comprising of sentence and commonsense knowledge pairs. The knowledge inferred by CoSe-Co is diverse and contain novel entities not present in the underlying KG. We augment generated knowledge in Multi-Choice QA and Open-ended CommonSense Reasoning tasks leading to improvements over current best methods on CSQA, ARC, QASC and OBQA datasets. We also demonstrate its applicability in improving performance of a baseline model for paraphrase generation task.

* Accepted at NAACL 2022 (main conference)

Via

Access Paper or Ask Questions

INDIGO: Intrinsic Multimodality for Domain Generalization

Jun 13, 2022

Puneet Mangla, Shivam Chandhok, Milan Aggarwal, Vineeth N Balasubramanian, Balaji Krishnamurthy

Figure 1 for INDIGO: Intrinsic Multimodality for Domain Generalization

Abstract:For models to generalize under unseen domains (a.k.a domain generalization), it is crucial to learn feature representations that are domain-agnostic and capture the underlying semantics that makes up an object category. Recent advances towards weakly supervised vision-language models that learn holistic representations from cheap weakly supervised noisy text annotations have shown their ability on semantic understanding by capturing object characteristics that generalize under different domains. However, when multiple source domains are involved, the cost of curating textual annotations for every image in the dataset can blow up several times, depending on their number. This makes the process tedious and infeasible, hindering us from directly using these supervised vision-language approaches to achieve the best generalization on an unseen domain. Motivated from this, we study how multimodal information from existing pre-trained multimodal networks can be leveraged in an "intrinsic" way to make systems generalize under unseen domains. To this end, we propose IntriNsic multimodality for DomaIn GeneralizatiOn (INDIGO), a simple and elegant way of leveraging the intrinsic modality present in these pre-trained multimodal networks along with the visual modality to enhance generalization to unseen domains at test-time. We experiment on several Domain Generalization settings (ClosedDG, OpenDG, and Limited sources) and show state-of-the-art generalization performance on unseen domains. Further, we provide a thorough analysis to develop a holistic understanding of INDIGO.

* Under Submission

Via

Access Paper or Ask Questions

Form2Seq : A Framework for Higher-Order Form Structure Extraction

Jul 09, 2021

Milan Aggarwal, Hiresh Gupta, Mausoom Sarkar, Balaji Krishnamurthy

Figure 1 for Form2Seq : A Framework for Higher-Order Form Structure Extraction

Figure 2 for Form2Seq : A Framework for Higher-Order Form Structure Extraction

Figure 3 for Form2Seq : A Framework for Higher-Order Form Structure Extraction

Figure 4 for Form2Seq : A Framework for Higher-Order Form Structure Extraction

Abstract:Document structure extraction has been a widely researched area for decades with recent works performing it as a semantic segmentation task over document images using fully-convolution networks. Such methods are limited by image resolution due to which they fail to disambiguate structures in dense regions which appear commonly in forms. To mitigate this, we propose Form2Seq, a novel sequence-to-sequence (Seq2Seq) inspired framework for structure extraction using text, with a specific focus on forms, which leverages relative spatial arrangement of structures. We discuss two tasks; 1) Classification of low-level constituent elements (TextBlock and empty fillable Widget) into ten types such as field captions, list items, and others; 2) Grouping lower-level elements into higher-order constructs, such as Text Fields, ChoiceFields and ChoiceGroups, used as information collection mechanism in forms. To achieve this, we arrange the constituent elements linearly in natural reading order, feed their spatial and textual representations to Seq2Seq framework, which sequentially outputs prediction of each element depending on the final task. We modify Seq2Seq for grouping task and discuss improvements obtained through cascaded end-to-end training of two tasks versus training in isolation. Experimental results show the effectiveness of our text-based approach achieving an accuracy of 90% on classification task and an F1 of 75.82, 86.01, 61.63 on groups discussed above respectively, outperforming segmentation baselines. Further we show our framework achieves state of the results for table structure recognition on ICDAR 2013 dataset.

* This paper has been presented at EMNLP 2020

Via

Access Paper or Ask Questions