Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tyler Loakman

Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders

Jul 08, 2025

Shun Wang, Tyler Loakman, Youbo Lei, Yi Liu, Bohao Yang, Yuting Zhao, Dong Yang, Chenghua Lin

Abstract:Large Language Models (LLMs) are traditionally viewed as black-box algorithms, therefore reducing trustworthiness and obscuring potential approaches to increasing performance on downstream tasks. In this work, we apply an effective LLM decomposition method using a dictionary-learning approach with sparse autoencoders. This helps extract monosemantic features from polysemantic LLM neurons. Remarkably, our work identifies model-internal misunderstanding, allowing the automatic reformulation of the prompts with additional annotations to improve the interpretation by LLMs. Moreover, this approach demonstrates a significant performance improvement in downstream tasks, such as mathematical reasoning and metaphor detection.

Via

Access Paper or Ask Questions

MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language

Jun 19, 2024

Shun Wang, Ge Zhang, Han Wu, Tyler Loakman, Wenhao Huang, Chenghua Lin

Figure 1 for MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language

Figure 2 for MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language

Figure 3 for MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language

Figure 4 for MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language

Abstract:Machine Translation (MT) has developed rapidly since the release of Large Language Models and current MT evaluation is performed through comparison with reference human translations or by predicting quality scores from human-labeled data. However, these mainstream evaluation methods mainly focus on fluency and factual reliability, whilst paying little attention to figurative quality. In this paper, we investigate the figurative quality of MT and propose a set of human evaluation metrics focused on the translation of figurative language. We additionally present a multilingual parallel metaphor corpus generated by post-editing. Our evaluation protocol is designed to estimate four aspects of MT: Metaphorical Equivalence, Emotion, Authenticity, and Quality. In doing so, we observe that translations of figurative expressions display different traits from literal ones.

Via

Access Paper or Ask Questions

ReproHum #0087-01: Human Evaluation Reproduction Report for Generating Fact Checking Explanations

Apr 26, 2024

Tyler Loakman, Chenghua Lin

Abstract:This paper presents a partial reproduction of Generating Fact Checking Explanations by Anatanasova et al (2020) as part of the ReproHum element of the ReproNLP shared task to reproduce the findings of NLP research regarding human evaluation. This shared task aims to investigate the extent to which NLP as a field is becoming more or less reproducible over time. Following the instructions provided by the task organisers and the original authors, we collect relative rankings of 3 fact-checking explanations (comprising a gold standard and the outputs of 2 models) for 40 inputs on the criteria of Coverage. The results of our reproduction and reanalysis of the original work's raw results lend support to the original findings, with similar patterns seen between the original work and our reproduction. Whilst we observe slight variation from the original results, our findings support the main conclusions drawn by the original authors pertaining to the efficacy of their proposed models.

* Accepted to HumEval at LREC-Coling 2024

Via

Access Paper or Ask Questions

Train & Constrain: Phonologically Informed Tongue-Twister Generation from Topics and Paraphrases

Mar 20, 2024

Tyler Loakman, Chen Tang, Chenghua Lin

Abstract:Previous work in phonologically and phonetically grounded language generation has mainly focused on domains such as puns and poetry. In this article, we present new work on the generation of tongue-twisters - a form of language that is required to be conditioned on a phoneme level to maximize sound overlap, whilst maintaining semantic consistency with an input topic and still being grammatically correct. We present TwisterLister, a pipeline for generating phonologically informed tongue-twisters from Large Language Models (LLMs) that we use to generate TwistList 2.0, the largest annotated dataset of tongue-twisters to date, consisting of 17K+ examples from a combination of human and LLM authors. Our generation pipeline involves the use of a phonologically constrained vocabulary alongside LLM prompting to generate novel, non-derivative tongue-twister examples. We additionally present the results of automatic and human evaluation of smaller models trained on our generated dataset to demonstrate the extent to which phonologically motivated language types can be generated without explicit injection of phonological knowledge. Additionally, we introduce a Phoneme-Aware Constrained Decoding module (PACD) that can be integrated into any causal language model and demonstrate that this method generates good quality tongue-twisters both with and without fine-tuning the underlying language model. We also design and implement a range of automatic metrics for the task of tongue-twister generation that is phonologically motivated and captures the unique essence of tongue-twisters based on Phonemic Edit Distance (PED).

* Submitted to Computational Linguistics

Via

Access Paper or Ask Questions

A Cross-Attention Augmented Model for Event-Triggered Context-Aware Story Generation

Nov 19, 2023

Chen Tang, Tyler Loakman, Chenghua Lin

Abstract:Despite recent advancements, existing story generation systems continue to encounter difficulties in effectively incorporating contextual and event features, which greatly influence the quality of generated narratives. To tackle these challenges, we introduce a novel neural generation model, EtriCA, that enhances the relevance and coherence of generated stories by employing a cross-attention mechanism to map context features onto event sequences through residual mapping. This feature capturing mechanism enables our model to exploit logical relationships between events more effectively during the story generation process. To further enhance our proposed model, we employ a post-training framework for knowledge enhancement (KeEtriCA) on a large-scale book corpus. This allows EtriCA to adapt to a wider range of data samples. This results in approximately 5\% improvement in automatic metrics and over 10\% improvement in human evaluation. We conduct extensive experiments, including comparisons with state-of-the-art (SOTA) baseline models, to evaluate the performance of our framework on story generation. The experimental results, encompassing both automated metrics and human assessments, demonstrate the superiority of our model over existing state-of-the-art baselines. These results underscore the effectiveness of our model in leveraging context and event features to improve the quality of generated narratives.

* Submitted to CSL

Via

Access Paper or Ask Questions

The Iron(ic) Melting Pot: Reviewing Human Evaluation in Humour, Irony and Sarcasm Generation

Nov 09, 2023

Tyler Loakman, Aaron Maladry, Chenghua Lin

Figure 1 for The Iron(ic) Melting Pot: Reviewing Human Evaluation in Humour, Irony and Sarcasm Generation

Figure 2 for The Iron(ic) Melting Pot: Reviewing Human Evaluation in Humour, Irony and Sarcasm Generation

Figure 3 for The Iron(ic) Melting Pot: Reviewing Human Evaluation in Humour, Irony and Sarcasm Generation

Figure 4 for The Iron(ic) Melting Pot: Reviewing Human Evaluation in Humour, Irony and Sarcasm Generation

Abstract:Human evaluation is often considered to be the gold standard method of evaluating a Natural Language Generation system. However, whilst its importance is accepted by the community at large, the quality of its execution is often brought into question. In this position paper, we argue that the generation of more esoteric forms of language - humour, irony and sarcasm - constitutes a subdomain where the characteristics of selected evaluator panels are of utmost importance, and every effort should be made to report demographic characteristics wherever possible, in the interest of transparency and replicability. We support these claims with an overview of each language form and an analysis of examples in terms of how their interpretation is affected by different participant variables. We additionally perform a critical survey of recent works in NLG to assess how well evaluation procedures are reported in this subdomain, and note a severe lack of open reporting of evaluator demographic information, and a significant reliance on crowdsourcing platforms for recruitment.

* Accepted to Findings of EMNLP 2023

Via

Access Paper or Ask Questions

Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation

Jun 28, 2023

Chen Tang, Hongbo Zhang, Tyler Loakman, Chenghua Lin, Frank Guerin

Figure 1 for Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation

Figure 2 for Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation

Figure 3 for Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation

Figure 4 for Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation

Abstract:Incorporating external graph knowledge into neural chatbot models has been proven effective for enhancing dialogue generation. However, in conventional graph neural networks (GNNs), message passing on a graph is independent from text, resulting in the graph representation hidden space differing from that of the text. This training regime of existing models therefore leads to a semantic gap between graph knowledge and text. In this study, we propose a novel framework for knowledge graph enhanced dialogue generation. We dynamically construct a multi-hop knowledge graph with pseudo nodes to involve the language model in feature aggregation within the graph at all steps. To avoid the semantic biases caused by learning on vanilla subgraphs, the proposed framework applies hierarchical graph attention to aggregate graph features on pseudo nodes and then attains a global feature. Therefore, the framework can better utilise the heterogeneous features from both the post and external graph knowledge. Extensive experiments demonstrate that our framework outperforms state-of-the-art (SOTA) baselines on dialogue generation. Further analysis also shows that our representation learning framework can fill the semantic gap by coagulating representations of both text and graph knowledge. Moreover, the language model also learns how to better select knowledge triples for a more informative response via exploiting subgraph patterns within our feature aggregation process. Our code and resources are available at https://github.com/tangg555/SaBART.

* Accepted by ACL 2023

Via

Access Paper or Ask Questions

TwistList: Resources and Baselines for Tongue Twister Generation

Jun 07, 2023

Tyler Loakman, Chen Tang, Chenghua Lin

Abstract:Previous work in phonetically-grounded language generation has mainly focused on domains such as lyrics and poetry. In this paper, we present work on the generation of tongue twisters - a form of language that is required to be phonetically conditioned to maximise sound overlap, whilst maintaining semantic consistency with an input topic, and still being grammatically correct. We present \textbf{TwistList}, a large annotated dataset of tongue twisters, consisting of 2.1K+ human-authored examples. We additionally present several benchmark systems (referred to as TwisterMisters) for the proposed task of tongue twister generation, including models that both do and do not require training on in-domain data. We present the results of automatic and human evaluation to demonstrate the performance of existing mainstream pre-trained models in this task with limited (or no) task specific training and data, and no explicit phonetic knowledge. We find that the task of tongue twister generation is challenging for models under these conditions, yet some models are still capable of generating acceptable examples of this language type.

* ACL 2023

Via

Access Paper or Ask Questions

Terminology-aware Medical Dialogue Generation

Oct 27, 2022

Chen Tang, Hongbo Zhang, Tyler Loakman, Chenghua Lin, Frank Guerin

Abstract:Medical dialogue generation aims to generate responses according to a history of dialogue turns between doctors and patients. Unlike open-domain dialogue generation, this requires background knowledge specific to the medical domain. Existing generative frameworks for medical dialogue generation fall short of incorporating domain-specific knowledge, especially with regard to medical terminology. In this paper, we propose a novel framework to improve medical dialogue generation by considering features centered on domain-specific terminology. We leverage an attention mechanism to incorporate terminologically centred features, and fill in the semantic gap between medical background knowledge and common utterances by enforcing language models to learn terminology representations with an auxiliary terminology recognition task. Experimental results demonstrate the effectiveness of our approach, in which our proposed framework outperforms SOTA language models. Additionally, we provide a new dataset with medical terminology annotations to support the research on medical dialogue generation. Our dataset and code are available at https://github.com/tangg555/meddialog.

* Submitted to ICASSP 2023

Via

Access Paper or Ask Questions

NGEP: A Graph-based Event Planning Framework for Story Generation

Oct 19, 2022

Chen Tang, Zhihao Zhang, Tyler Loakman, Chenghua Lin, Frank Guerin

Figure 1 for NGEP: A Graph-based Event Planning Framework for Story Generation

Figure 2 for NGEP: A Graph-based Event Planning Framework for Story Generation

Figure 3 for NGEP: A Graph-based Event Planning Framework for Story Generation

Figure 4 for NGEP: A Graph-based Event Planning Framework for Story Generation

Abstract:To improve the performance of long text generation, recent studies have leveraged automatically planned event structures (i.e. storylines) to guide story generation. Such prior works mostly employ end-to-end neural generation models to predict event sequences for a story. However, such generation models struggle to guarantee the narrative coherence of separate events due to the hallucination problem, and additionally the generated event sequences are often hard to control due to the end-to-end nature of the models. To address these challenges, we propose NGEP, an novel event planning framework which generates an event sequence by performing inference on an automatically constructed event graph and enhances generalisation ability through a neural event advisor. We conduct a range of experiments on multiple criteria, and the results demonstrate that our graph-based neural framework outperforms the state-of-the-art (SOTA) event planning approaches, considering both the performance of event sequence generation and the effectiveness on the downstream task of story generation.

* AACL 2022

Via

Access Paper or Ask Questions