Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alireza Mohammadshahi

Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration

Jan 25, 2024

Alireza Mohammadshahi, Ali Shaikh, Majid Yazdani

Figure 1 for Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration

Figure 2 for Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration

Figure 3 for Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration

Figure 4 for Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration

Abstract:In this paper, we propose an architecture to harness the collective knowledge of multiple trained LLMs to create a new state-of-the-art. At the core of this framework is a LLM-based orchestrator that is adept at picking the right underlying LLM experts for optimal task execution. Inspired by self-play in reinforcement learning, we created a loop of query generation, orchestration, and evaluation to generate training data for the orchestrator. Our evaluation focused on the MMLU benchmark, employing models with 7B, 13B, and 34B parameters available on Hugging Face. The results demonstrate new state-of-the-art open-source models: Our Leeroo orchestrator achieves performance on par with the Mixtral model while incurring only two-thirds of its cost. Moreover, increasing the allowed cost surpasses Mixtral's accuracy by over 5% at the same cost level, reaching an accuracy of 75.9%. Further enhancements were observed when integrating GPT4 into the underlying model pool. The Leeroo orchestrator nearly matches GPT4's performance at half the cost and even exceeds GPT4's results with a 25% cost reduction. These findings illustrate the potential of our architecture in creating state-of-the-art and cost-effective LLMs by optimizing the synergy between multiple LLMs to achieve superior performance outcomes.

Via

Access Paper or Ask Questions

Investigating Multi-Pivot Ensembling with Massively Multilingual Machine Translation Models

Nov 14, 2023

Alireza Mohammadshahi, Jannis Vamvas, Rico Sennrich

Abstract:Massively multilingual machine translation models allow for the translation of a large number of languages with a single model, but have limited performance on low- and very-low-resource translation directions. Pivoting via high-resource languages remains a strong strategy for low-resource directions, and in this paper we revisit ways of pivoting through multiple languages. Previous work has used a simple averaging of probability distributions from multiple paths, but we find that this performs worse than using a single pivot, and exacerbates the hallucination problem because the same hallucinations can be probable across different paths. As an alternative, we propose MaxEns, a combination strategy that is biased towards the most confident predictions, hypothesising that confident predictions are less prone to be hallucinations. We evaluate different strategies on the FLORES benchmark for 20 low-resource language directions, demonstrating that MaxEns improves translation quality for low-resource languages while reducing hallucination in translations, compared to both direct translation and an averaging approach. On average, multi-pivot strategies still lag behind using English as a single pivot language, raising the question of how to identify the best pivoting strategy for a given translation direction.

Via

Access Paper or Ask Questions

Transformers as Graph-to-Graph Models

Oct 27, 2023

James Henderson, Alireza Mohammadshahi, Andrei C. Coman, Lesly Miculicich

Abstract:We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case. Attention weights are functionally equivalent to graph edges. Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions, thereby integrating explicit graphs into the latent graphs learned by pretrained Transformers. Adding iterative graph refinement provides a joint embedding of input, output, and latent graphs, allowing non-autoregressive graph prediction to optimise the complete graph without any bespoke pipeline or decoding strategy. Empirical results show that this architecture achieves state-of-the-art accuracies for modelling a variety of linguistic structures, integrating very effectively with the latent linguistic representations learned by pretraining.

* Accepted to Big Picture workshop at EMNLP 2023

Via

Access Paper or Ask Questions

Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding

Sep 13, 2023

Rico Sennrich, Jannis Vamvas, Alireza Mohammadshahi

Figure 1 for Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding

Figure 2 for Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding

Figure 3 for Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding

Figure 4 for Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding

Abstract:Hallucinations and off-target translation remain unsolved problems in machine translation, especially for low-resource languages and massively multilingual models. In this paper, we introduce methods to mitigate both failure cases with a modified decoding objective, without requiring retraining or external models. In source-contrastive decoding, we search for a translation that is probable given the correct input, but improbable given a random input segment, hypothesising that hallucinations will be similarly probable given either. In language-contrastive decoding, we search for a translation that is probable, but improbable given the wrong language indicator token. In experiments on M2M-100 (418M) and SMaLL-100, we find that these methods effectively suppress hallucinations and off-target translations, improving chrF2 by 1.7 and 1.4 points on average across 57 tested translation directions. In a proof of concept on English--German, we also show that we can suppress off-target translations with the Llama 2 chat models, demonstrating the applicability of the method to machine translation with LLMs. We release our source code at https://github.com/ZurichNLP/ContraDecode.

Via

Access Paper or Ask Questions

RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question

Nov 09, 2022

Alireza Mohammadshahi, Thomas Scialom, Majid Yazdani, Pouya Yanki, Angela Fan, James Henderson, Marzieh Saeidi

Abstract:Existing metrics for evaluating the quality of automatically generated questions such as BLEU, ROUGE, BERTScore, and BLEURT compare the reference and predicted questions, providing a high score when there is a considerable lexical overlap or semantic similarity between the candidate and the reference questions. This approach has two major shortcomings. First, we need expensive human-provided reference questions. Second, it penalises valid questions that may not have high lexical or semantic similarity to the reference questions. In this paper, we propose a new metric, RQUGE, based on the answerability of the candidate question given the context. The metric consists of a question-answering and a span scorer module, in which we use pre-trained models from the existing literature, and therefore, our metric can be used without further training. We show that RQUGE has a higher correlation with human judgment without relying on the reference question. RQUGE is shown to be significantly more robust to several adversarial corruptions. Additionally, we illustrate that we can significantly improve the performance of QA models on out-of-domain datasets by fine-tuning on the synthetic data generated by a question generation model and re-ranked by RQUGE.

* 19 pages, 8 figures

Via

Access Paper or Ask Questions

Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers

May 23, 2022

Luis Espinosa-Anke, Alexander Shvets, Alireza Mohammadshahi, James Henderson, Leo Wanner

Figure 1 for Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers

Figure 2 for Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers

Figure 3 for Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers

Figure 4 for Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers

Abstract:Recognizing and categorizing lexical collocations in context is useful for language learning, dictionary compilation and downstream NLP. However, it is a challenging task due to the varying degrees of frozenness lexical collocations exhibit. In this paper, we put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context. Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.

* Accepted to *SEM2022

Via

Access Paper or Ask Questions

What Do Compressed Multilingual Machine Translation Models Forget?

May 22, 2022

Alireza Mohammadshahi, Vassilina Nikoulina, Alexandre Berard, Caroline Brun, James Henderson, Laurent Besacier

Figure 1 for What Do Compressed Multilingual Machine Translation Models Forget?

Figure 2 for What Do Compressed Multilingual Machine Translation Models Forget?

Figure 3 for What Do Compressed Multilingual Machine Translation Models Forget?

Figure 4 for What Do Compressed Multilingual Machine Translation Models Forget?

Abstract:Recently, very large pre-trained models achieve state-of-the-art results in various natural language processing (NLP) tasks, but their size makes it more challenging to apply them in resource-constrained environments. Compression techniques allow to drastically reduce the size of the model and therefore its inference time with negligible impact on top-tier metrics. However, the general performance hides a drastic performance drop on under-represented features, which could result in the amplification of biases encoded by the model. In this work, we analyze the impacts of compression methods on Multilingual Neural Machine Translation models (MNMT) for various language groups and semantic features by extensive analysis of compressed models on different NMT benchmarks, e.g. FLORES-101, MT-Gender, and DiBiMT. Our experiments show that the performance of under-represented languages drops significantly, while the average BLEU metric slightly decreases. Interestingly, the removal of noisy memorization with the compression leads to a significant improvement for some medium-resource languages. Finally, we demonstrate that the compression amplifies intrinsic gender and semantic biases, even in high-resource languages.

* 20 pages, 12 Figures, 9 Tables

Via

Access Paper or Ask Questions

The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task

Jul 05, 2021

James Barry, Alireza Mohammadshahi, Joachim Wagner, Jennifer Foster, James Henderson

Figure 1 for The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task

Figure 2 for The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task

Figure 3 for The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task

Figure 4 for The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task

Abstract:We describe the DCU-EPFL submission to the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies. The task involves parsing Enhanced UD graphs, which are an extension of the basic dependency trees designed to be more facilitative towards representing semantic structure. Evaluation is carried out on 29 treebanks in 17 languages and participants are required to parse the data from each language starting from raw strings. Our approach uses the Stanza pipeline to preprocess the text files, XLMRoBERTa to obtain contextualized token representations, and an edge-scoring and labeling model to predict the enhanced graph. Finally, we run a post-processing script to ensure all of our outputs are valid Enhanced UD graphs. Our system places 6th out of 9 participants with a coarse Enhanced Labeled Attachment Score (ELAS) of 83.57. We carry out additional post-deadline experiments which include using Trankit for pre-processing, XLM-RoBERTa-LARGE, treebank concatenation, and multitask learning between a basic and an enhanced dependency parser. All of these modifications improve our initial score and our final system has a coarse ELAS of 88.04.

* Submitted to the IWPT 2021 Shared Task: From Raw Text to Enhanced Universal Dependencies: the Parsing Shared Task at IWPT 2021

Via

Access Paper or Ask Questions

Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling

Apr 15, 2021

Alireza Mohammadshahi, James Henderson

Figure 1 for Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling

Figure 2 for Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling

Figure 3 for Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling

Figure 4 for Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling

Abstract:The goal of semantic role labelling (SRL) is to recognise the predicate-argument structure of a sentence. Recent models have shown that syntactic information can enhance the SRL performance, but other syntax-agnostic approaches achieved reasonable performance. The best way to encode syntactic information for the SRL task is still an open question. In this paper, we propose the Syntax-aware Graph-to-Graph Transformer (SynG2G-Tr) architecture, which encodes the syntactic structure with a novel way to input graph relations as embeddings directly into the self-attention mechanism of Transformer. This approach adds a soft bias towards attention patterns that follow the syntactic structure but also allows the model to use this information to learn alternative patterns. We evaluate our model on both dependency-based and span-based SRL datasets, and outperform all previous syntax-aware and syntax-agnostic models in both in-domain and out-of-domain settings, on the CoNLL 2005 and CoNLL 2009 datasets. Our architecture is general and can be applied to encode any graph information for a desired downstream task.

* 13 pages, 4 figures

Via

Access Paper or Ask Questions

Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement

Mar 29, 2020

Alireza Mohammadshahi, James Henderson

Figure 1 for Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement

Figure 2 for Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement

Figure 3 for Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement

Figure 4 for Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement

Abstract:We propose the Recursive Non-autoregressive Graph-to-graph Transformer architecture (RNG-Tr) for the iterative refinement of arbitrary graphs through the recursive application of a non-autoregressive Graph-to-Graph Transformer and apply it to syntactic dependency parsing. The Graph-to-Graph Transformer architecture of \newcite{mohammadshahi2019graphtograph} has previously been used for autoregressive graph prediction, but here we use it to predict all edges of the graph independently, conditioned on a previous prediction of the same graph. We demonstrate the power and effectiveness of RNG-Tr on several dependency corpora, using a refinement model pre-trained with BERT \cite{devlin2018bert}. We also introduce Dependency BERT (DepBERT), a non-recursive parser similar to our refinement model. RNG-Tr is able to improve the accuracy of a variety of initial parsers on 13 languages from the Universal Dependencies Treebanks and the English and Chinese Penn Treebanks, even improving over the new state-of-the-art results achieved by DepBERT, significantly improving the state-of-the-art for all corpora tested.

Via

Access Paper or Ask Questions