Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Patrick Huber

Scaling Parameter-Constrained Language Models with Quality Data

Oct 04, 2024

Ernie Chang, Matteo Paltenghi, Yang Li, Pin-Jie Lin, Changsheng Zhao, Patrick Huber, Zechun Liu, Rastislav Rabatin, Yangyang Shi, Vikas Chandra

Figure 1 for Scaling Parameter-Constrained Language Models with Quality Data

Figure 2 for Scaling Parameter-Constrained Language Models with Quality Data

Figure 3 for Scaling Parameter-Constrained Language Models with Quality Data

Figure 4 for Scaling Parameter-Constrained Language Models with Quality Data

Abstract:Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters, providing compute-optimal estimates but often neglecting the impact of data quality on model generalization. In this paper, we extend the conventional understanding of scaling law by offering a microscopic view of data quality within the original formulation -- effective training tokens -- which we posit to be a critical determinant of performance for parameter-constrained language models. Specifically, we formulate the proposed term of effective training tokens to be a combination of two readily-computed indicators of text: (i) text diversity and (ii) syntheticity as measured by a teacher model. We pretrained over $200$ models of 25M to 1.5B parameters on a diverse set of sampled, synthetic data, and estimated the constants that relate text quality, model size, training tokens, and eight reasoning task accuracy scores. We demonstrated the estimated constants yield +0.83 Pearson correlation with true accuracies, and analyzed it in scenarios involving widely-used data techniques such as data sampling and synthesis which aim to improve data quality.

* Accepted to EMNLP 2024 Industry Track, 18 pages, 9 figures, 4 tables

Via

Access Paper or Ask Questions

CoDi: Conversational Distillation for Grounded Question Answering

Aug 20, 2024

Patrick Huber, Arash Einolghozati, Rylan Conway, Kanika Narang, Matt Smith, Waqar Nayyar, Adithya Sagar, Ahmed Aly, Akshat Shrivastava

Figure 1 for CoDi: Conversational Distillation for Grounded Question Answering

Figure 2 for CoDi: Conversational Distillation for Grounded Question Answering

Figure 3 for CoDi: Conversational Distillation for Grounded Question Answering

Figure 4 for CoDi: Conversational Distillation for Grounded Question Answering

Abstract:Distilling conversational skills into Small Language Models (SLMs) with approximately 1 billion parameters presents significant challenges. Firstly, SLMs have limited capacity in their model parameters to learn extensive knowledge compared to larger models. Secondly, high-quality conversational datasets are often scarce, small, and domain-specific. Addressing these challenges, we introduce a novel data distillation framework named CoDi (short for Conversational Distillation, pronounced "Cody"), allowing us to synthesize large-scale, assistant-style datasets in a steerable and diverse manner. Specifically, while our framework is task agnostic at its core, we explore and evaluate the potential of CoDi on the task of conversational grounded reasoning for question answering. This is a typical on-device scenario for specialist SLMs, allowing for open-domain model responses, without requiring the model to "memorize" world knowledge in its limited weights. Our evaluations show that SLMs trained with CoDi-synthesized data achieve performance comparable to models trained on human-annotated data in standard metrics. Additionally, when using our framework to generate larger datasets from web data, our models surpass larger, instruction-tuned models in zero-shot conversational grounded reasoning tasks.

* 13 pages

Via

Access Paper or Ask Questions

Small But Funny: A Feedback-Driven Approach to Humor Distillation

Feb 28, 2024

Sahithya Ravi, Patrick Huber, Akshat Shrivastava, Aditya Sagar, Ahmed Aly, Vered Shwartz, Arash Einolghozati

Figure 1 for Small But Funny: A Feedback-Driven Approach to Humor Distillation

Figure 2 for Small But Funny: A Feedback-Driven Approach to Humor Distillation

Figure 3 for Small But Funny: A Feedback-Driven Approach to Humor Distillation

Figure 4 for Small But Funny: A Feedback-Driven Approach to Humor Distillation

Abstract:The emergence of Large Language Models (LLMs) has brought to light promising language generation capabilities, particularly in performing tasks like complex reasoning and creative writing. Consequently, distillation through imitation of teacher responses has emerged as a popular technique to transfer knowledge from LLMs to more accessible, Small Language Models (SLMs). While this works well for simpler tasks, there is a substantial performance gap on tasks requiring intricate language comprehension and creativity, such as humor generation. We hypothesize that this gap may stem from the fact that creative tasks might be hard to learn by imitation alone and explore whether an approach, involving supplementary guidance from the teacher, could yield higher performance. To address this, we study the effect of assigning a dual role to the LLM - as a "teacher" generating data, as well as a "critic" evaluating the student's performance. Our experiments on humor generation reveal that the incorporation of feedback significantly narrows the performance gap between SLMs and their larger counterparts compared to merely relying on imitation. As a result, our research highlights the potential of using feedback as an additional dimension to data when transferring complex language abilities via distillation.

Via

Access Paper or Ask Questions

Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Feb 16, 2024

Zekun Li, Zhiyu Zoey Chen, Mike Ross, Patrick Huber, Seungwhan Moon, Zhaojiang Lin, Xin Luna Dong, Adithya Sagar, Xifeng Yan, Paul A. Crook

Figure 1 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Figure 2 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Figure 3 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Figure 4 for Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Abstract:Large language models (LLMs) are increasingly prevalent in conversational systems due to their advanced understanding and generative capabilities in general contexts. However, their effectiveness in task-oriented dialogues (TOD), which requires not only response generation but also effective dialogue state tracking (DST) within specific tasks and domains, remains less satisfying. In this work, we propose a novel approach FnCTOD for solving DST with LLMs through function calling. This method improves zero-shot DST, allowing adaptation to diverse domains without extensive data collection or model tuning. Our experimental results demonstrate that our approach achieves exceptional performance with both modestly sized open-source and also proprietary LLMs: with in-context prompting it enables various 7B or 13B parameter models to surpass the previous state-of-the-art (SOTA) achieved by ChatGPT, and improves ChatGPT's performance beating the SOTA by 5.6% Avg. JGA. Individual model results for GPT-3.5 and GPT-4 are boosted by 4.8% and 14%, respectively. We also show that by fine-tuning on a small collection of diverse task-oriented dialogues, we can equip modestly sized models, specifically a 13B parameter LLaMA2-Chat model, with function-calling capabilities and DST performance comparable to ChatGPT while maintaining their chat capabilities. We plan to open-source experimental code and model.

Via

Access Paper or Ask Questions

FLEE-GNN: A Federated Learning System for Edge-Enhanced Graph Neural Network in Analyzing Geospatial Resilience of Multicommodity Food Flows

Oct 20, 2023

Yuxiao Qu, Jinmeng Rao, Song Gao, Qianheng Zhang, Wei-Lun Chao, Yu Su, Michelle Miller, Alfonso Morales, Patrick Huber

Figure 1 for FLEE-GNN: A Federated Learning System for Edge-Enhanced Graph Neural Network in Analyzing Geospatial Resilience of Multicommodity Food Flows

Figure 2 for FLEE-GNN: A Federated Learning System for Edge-Enhanced Graph Neural Network in Analyzing Geospatial Resilience of Multicommodity Food Flows

Figure 3 for FLEE-GNN: A Federated Learning System for Edge-Enhanced Graph Neural Network in Analyzing Geospatial Resilience of Multicommodity Food Flows

Figure 4 for FLEE-GNN: A Federated Learning System for Edge-Enhanced Graph Neural Network in Analyzing Geospatial Resilience of Multicommodity Food Flows

Abstract:Understanding and measuring the resilience of food supply networks is a global imperative to tackle increasing food insecurity. However, the complexity of these networks, with their multidimensional interactions and decisions, presents significant challenges. This paper proposes FLEE-GNN, a novel Federated Learning System for Edge-Enhanced Graph Neural Network, designed to overcome these challenges and enhance the analysis of geospatial resilience of multicommodity food flow network, which is one type of spatial networks. FLEE-GNN addresses the limitations of current methodologies, such as entropy-based methods, in terms of generalizability, scalability, and data privacy. It combines the robustness and adaptability of graph neural networks with the privacy-conscious and decentralized aspects of federated learning on food supply network resilience analysis across geographical regions. This paper also discusses FLEE-GNN's innovative data generation techniques, experimental designs, and future directions for improvement. The results show the advancements of this approach to quantifying the resilience of multicommodity food flow networks, contributing to efforts towards ensuring global food security using AI methods. The developed FLEE-GNN has the potential to be applied in other spatial networks with spatially heterogeneous sub-network distributions.

* ACM SIGSPATIAL GeoAI 2023
* 10 pages, 5 figures

Via

Access Paper or Ask Questions

Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues

Feb 12, 2023

Chuyuan Li, Patrick Huber, Wen Xiao, Maxime Amblard, Chloé Braud, Giuseppe Carenini

Figure 1 for Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues

Figure 2 for Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues

Figure 3 for Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues

Figure 4 for Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues

Abstract:Discourse processing suffers from data sparsity, especially for dialogues. As a result, we explore approaches to build discourse structures for dialogues, based on attention matrices from Pre-trained Language Models (PLMs). We investigate multiple tasks for fine-tuning and show that the dialogue-tailored Sentence Ordering task performs best. To locate and exploit discourse information in PLMs, we propose an unsupervised and a semi-supervised method. Our proposals achieve encouraging results on the STAC corpus, with F1 scores of 57.2 and 59.3 for unsupervised and semi-supervised methods, respectively. When restricted to projective trees, our scores improved to 63.3 and 68.1.

Via

Access Paper or Ask Questions

Towards Domain-Independent Supervised Discourse Parsing Through Gradient Boosting

Oct 18, 2022

Patrick Huber, Giuseppe Carenini

Figure 1 for Towards Domain-Independent Supervised Discourse Parsing Through Gradient Boosting

Abstract:Discourse analysis and discourse parsing have shown great impact on many important problems in the field of Natural Language Processing (NLP). Given the direct impact of discourse annotations on model performance and interpretability, robustly extracting discourse structures from arbitrary documents is a key task to further improve computational models in NLP. To this end, we present a new, supervised paradigm directly tackling the domain adaptation issue in discourse parsing. Specifically, we introduce the first fully supervised discourse parser designed to alleviate the domain dependency through a staged model of weak classifiers by introducing the gradient boosting framework.

* CODI 2022
* Extended Abstract. Non Archival. 3 pages

Via

Access Paper or Ask Questions

Unsupervised Inference of Data-Driven Discourse Structures using a Tree Auto-Encoder

Oct 18, 2022

Patrick Huber, Giuseppe Carenini

Figure 1 for Unsupervised Inference of Data-Driven Discourse Structures using a Tree Auto-Encoder

Figure 2 for Unsupervised Inference of Data-Driven Discourse Structures using a Tree Auto-Encoder

Abstract:With a growing need for robust and general discourse structures in many downstream tasks and real-world applications, the current lack of high-quality, high-quantity discourse trees poses a severe shortcoming. In order the alleviate this limitation, we propose a new strategy to generate tree structures in a task-agnostic, unsupervised fashion by extending a latent tree induction framework with an auto-encoding objective. The proposed approach can be applied to any tree-structured objective, such as syntactic parsing, discourse parsing and others. However, due to the especially difficult annotation process to generate discourse trees, we initially develop such method to complement task-specific models in generating much larger and more diverse discourse treebanks.

* CODI 2020
* Extended Abstract. Non-Archival. 2 pages

Via

Access Paper or Ask Questions

Improving Topic Segmentation by Injecting Discourse Dependencies

Sep 18, 2022

Linzi Xing, Patrick Huber, Giuseppe Carenini

Figure 1 for Improving Topic Segmentation by Injecting Discourse Dependencies

Figure 2 for Improving Topic Segmentation by Injecting Discourse Dependencies

Figure 3 for Improving Topic Segmentation by Injecting Discourse Dependencies

Figure 4 for Improving Topic Segmentation by Injecting Discourse Dependencies

Abstract:Recent neural supervised topic segmentation models achieve distinguished superior effectiveness over unsupervised methods, with the availability of large-scale training corpora sampled from Wikipedia. These models may, however, suffer from limited robustness and transferability caused by exploiting simple linguistic cues for prediction, but overlooking more important inter-sentential topical consistency. To address this issue, we present a discourse-aware neural topic segmentation model with the injection of above-sentence discourse dependency structures to encourage the model make topic boundary prediction based more on the topical consistency between sentences. Our empirical study on English evaluation datasets shows that injecting above-sentence discourse structures to a neural topic segmenter with our proposed strategy can substantially improve its performances on intra-domain and out-of-domain data, with little increase of model's complexity.

* Accepted to the 3rd Workshop on Computational Approaches to Discourse (CODI-2022) at COLING 2022

Via

Access Paper or Ask Questions

Towards Understanding Large-Scale Discourse Structures in Pre-Trained and Fine-Tuned Language Models

Apr 08, 2022

Patrick Huber, Giuseppe Carenini

Figure 1 for Towards Understanding Large-Scale Discourse Structures in Pre-Trained and Fine-Tuned Language Models

Figure 2 for Towards Understanding Large-Scale Discourse Structures in Pre-Trained and Fine-Tuned Language Models

Figure 3 for Towards Understanding Large-Scale Discourse Structures in Pre-Trained and Fine-Tuned Language Models

Figure 4 for Towards Understanding Large-Scale Discourse Structures in Pre-Trained and Fine-Tuned Language Models

Abstract:With a growing number of BERTology work analyzing different components of pre-trained language models, we extend this line of research through an in-depth analysis of discourse information in pre-trained and fine-tuned language models. We move beyond prior work along three dimensions: First, we describe a novel approach to infer discourse structures from arbitrarily long documents. Second, we propose a new type of analysis to explore where and how accurately intrinsic discourse is captured in the BERT and BART models. Finally, we assess how similar the generated structures are to a variety of baselines as well as their distribution within and between models.

* In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)
* 9 pages

Via

Access Paper or Ask Questions