Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Korat

Distributed Speculative Inference of Large Language Models

May 23, 2024

Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Oren Pereg, Moshe Wasserblat, Tomer Galanti, Michal Gordon, David Harel

Abstract:Accelerating the inference of large language models (LLMs) is an important challenge in artificial intelligence. This paper introduces distributed speculative inference (DSI), a novel distributed inference algorithm that is provably faster than speculative inference (SI) [leviathan2023fast, chen2023accelerating, miao2023specinfer] and traditional autoregressive inference (non-SI). Like other SI algorithms, DSI works on frozen LLMs, requiring no training or architectural modifications, and it preserves the target distribution. Prior studies on SI have demonstrated empirical speedups (compared to non-SI) but require a fast and accurate drafter LLM. In practice, off-the-shelf LLMs often do not have matching drafters that are sufficiently fast and accurate. We show a gap: SI gets slower than non-SI when using slower or less accurate drafters. We close this gap by proving that DSI is faster than both SI and non-SI given any drafters. By orchestrating multiple instances of the target and drafters, DSI is not only faster than SI but also supports LLMs that cannot be accelerated with SI. Our simulations show speedups of off-the-shelf LLMs in realistic settings: DSI is 1.29-1.92x faster than SI.

Via

Access Paper or Ask Questions

Accelerating Speculative Decoding using Dynamic Speculation Length

May 07, 2024

Jonathan Mamou, Oren Pereg, Daniel Korat, Moshe Berchansky, Nadav Timor, Moshe Wasserblat, Roy Schwartz

Figure 1 for Accelerating Speculative Decoding using Dynamic Speculation Length

Figure 2 for Accelerating Speculative Decoding using Dynamic Speculation Length

Figure 3 for Accelerating Speculative Decoding using Dynamic Speculation Length

Figure 4 for Accelerating Speculative Decoding using Dynamic Speculation Length

Abstract:Speculative decoding is a promising method for reducing the inference latency of large language models. The effectiveness of the method depends on the speculation length (SL) - the number of tokens generated by the draft model at each iteration. The vast majority of speculative decoding approaches use the same SL for all iterations. In this work, we show that this practice is suboptimal. We introduce DISCO, a DynamIc SpeCulation length Optimization method that uses a classifier to dynamically adjust the SL at each iteration, while provably preserving the decoding quality. Experiments with four benchmarks demonstrate average speedup gains of 10.3% relative to our best baselines.

Via

Access Paper or Ask Questions

Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs

Oct 18, 2022

Phillip Howard, Arden Ma, Vasudev Lal, Ana Paula Simoes, Daniel Korat, Oren Pereg, Moshe Wasserblat, Gadi Singer

Figure 1 for Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs

Figure 2 for Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs

Figure 3 for Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs

Figure 4 for Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs

Abstract:The extraction of aspect terms is a critical step in fine-grained sentiment analysis of text. Existing approaches for this task have yielded impressive results when the training and testing data are from the same domain. However, these methods show a drastic decrease in performance when applied to cross-domain settings where the domain of the testing data differs from that of the training data. To address this lack of extensibility and robustness, we propose a novel approach for automatically constructing domain-specific knowledge graphs that contain information relevant to the identification of aspect terms. We introduce a methodology for injecting information from these knowledge graphs into Transformer models, including two alternative mechanisms for knowledge insertion: via query enrichment and via manipulation of attention patterns. We demonstrate state-of-the-art performance on benchmark datasets for cross-domain aspect term extraction using our approach and investigate how the amount of external knowledge available to the Transformer impacts model performance.

* Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM 2022). Association for Computing Machinery, New York, NY, USA, 780-790

Via

Access Paper or Ask Questions

Efficient Few-Shot Learning Without Prompts

Sep 22, 2022

Lewis Tunstall, Nils Reimers, Unso Eun Seo Jo, Luke Bates, Daniel Korat, Moshe Wasserblat, Oren Pereg

Figure 1 for Efficient Few-Shot Learning Without Prompts

Figure 2 for Efficient Few-Shot Learning Without Prompts

Figure 3 for Efficient Few-Shot Learning Without Prompts

Figure 4 for Efficient Few-Shot Learning Without Prompts

Abstract:Recent few-shot methods, such as parameter-efficient fine-tuning (PEFT) and pattern exploiting training (PET), have achieved impressive results in label-scarce settings. However, they are difficult to employ since they are subject to high variability from manually crafted prompts, and typically require billion-parameter language models to achieve high accuracy. To address these shortcomings, we propose SetFit (Sentence Transformer Fine-tuning), an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers (ST). SetFit works by first fine-tuning a pretrained ST on a small number of text pairs, in a contrastive Siamese manner. The resulting model is then used to generate rich text embeddings, which are used to train a classification head. This simple framework requires no prompts or verbalizers, and achieves high accuracy with orders of magnitude less parameters than existing techniques. Our experiments show that SetFit obtains comparable results with PEFT and PET techniques, while being an order of magnitude faster to train. We also show that SetFit can be applied in multilingual settings by simply switching the ST body. Our code is available at https://github.com/huggingface/setfit and our datasets at https://huggingface.co/setfit .

Via

Access Paper or Ask Questions

3D Neural Network for Lung Cancer Risk Prediction on CT Volumes

Jul 25, 2020

Daniel Korat

Figure 1 for 3D Neural Network for Lung Cancer Risk Prediction on CT Volumes

Figure 2 for 3D Neural Network for Lung Cancer Risk Prediction on CT Volumes

Figure 3 for 3D Neural Network for Lung Cancer Risk Prediction on CT Volumes

Figure 4 for 3D Neural Network for Lung Cancer Risk Prediction on CT Volumes

Abstract:With an estimated 160,000 deaths in 2018, lung cancer is the most common cause of cancer death in the United States. Lung cancer CT screening has been shown to reduce mortality by up to 40% and is now included in US screening guidelines. Reducing the high error rates in lung cancer screening is imperative because of the high clinical and financial costs caused by diagnosis mistakes. Despite the use of standards for radiological diagnosis, persistent inter-grader variability and incomplete characterization of comprehensive imaging findings remain as limitations of current methods. These limitations suggest opportunities for more sophisticated systems to improve performance and inter-reader consistency. In this report, we reproduce a state-of-the-art deep learning algorithm for lung cancer risk prediction. Our model predicts malignancy probability and risk bucket classification from lung CT studies. This allows for risk categorization of patients being screened and suggests the most appropriate surveillance and management. Combining our solution high accuracy, consistency and fully automated nature, our approach may enable highly efficient screening procedures and accelerate the adoption of lung cancer screening.

Via

Access Paper or Ask Questions

ABSApp: A Portable Weakly-Supervised Aspect-Based Sentiment Extraction System

Sep 12, 2019

Oren Pereg, Daniel Korat, Moshe Wasserblat, Jonathan Mamou, Ido Dagan

Figure 1 for ABSApp: A Portable Weakly-Supervised Aspect-Based Sentiment Extraction System

Figure 2 for ABSApp: A Portable Weakly-Supervised Aspect-Based Sentiment Extraction System

Figure 3 for ABSApp: A Portable Weakly-Supervised Aspect-Based Sentiment Extraction System

Figure 4 for ABSApp: A Portable Weakly-Supervised Aspect-Based Sentiment Extraction System

Abstract:We present ABSApp, a portable system for weakly-supervised aspect-based sentiment extraction. The system is interpretable and user friendly and does not require labeled training data, hence can be rapidly and cost-effectively used across different domains in applied setups. The system flow includes three stages: First, it generates domain-specific aspect and opinion lexicons based on an unlabeled dataset; second, it enables the user to view and edit those lexicons (weak supervision); and finally, it enables the user to select an unlabeled target dataset from the same domain, classify it, and generate an aspect-based sentiment report. ABSApp has been successfully used in a number of real-life use cases, among them movie review analysis and convention impact analysis.

* 6 pages, demo paper at EMNLP 2019

Via

Access Paper or Ask Questions

Term Set Expansion based NLP Architect by Intel AI Lab

Oct 15, 2018

Jonathan Mamou, Oren Pereg, Moshe Wasserblat, Alon Eirew, Yael Green, Shira Guskin, Peter Izsak, Daniel Korat

Figure 1 for Term Set Expansion based NLP Architect by Intel AI Lab

Figure 2 for Term Set Expansion based NLP Architect by Intel AI Lab

Figure 3 for Term Set Expansion based NLP Architect by Intel AI Lab

Figure 4 for Term Set Expansion based NLP Architect by Intel AI Lab

Abstract:We present SetExpander, a corpus-based system for expanding a seed set of terms into amore complete set of terms that belong to the same semantic class. SetExpander implements an iterative end-to-end workflow. It enables users to easily select a seed set of terms, expand it, view the expanded set, validate it, re-expand the validated set and store it, thus simplifying the extraction of domain-specific fine-grained semantic classes.SetExpander has been used successfully in real-life use cases including integration into an automated recruitment system and an issues and defects resolution system. A video demo of SetExpander is available at https://drive.google.com/open?id=1e545bB87Autsch36DjnJHmq3HWfSd1Rv (some images were blurred for privacy reasons)

* EMNLP 2018 System Demonstrations. arXiv admin note: substantial text overlap with arXiv:1807.10104

Via

Access Paper or Ask Questions

Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end Workflow

Jul 26, 2018

Jonathan Mamou, Oren Pereg, Moshe Wasserblat, Ido Dagan, Yoav Goldberg, Alon Eirew, Yael Green, Shira Guskin, Peter Izsak, Daniel Korat

Figure 1 for Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end Workflow

Figure 2 for Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end Workflow

Figure 3 for Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end Workflow

Figure 4 for Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end Workflow

Abstract:We present SetExpander, a corpus-based system for expanding a seed set of terms into a more complete set of terms that belong to the same semantic class. SetExpander implements an iterative end-to end workflow for term set expansion. It enables users to easily select a seed set of terms, expand it, view the expanded set, validate it, re-expand the validated set and store it, thus simplifying the extraction of domain-specific fine-grained semantic classes. SetExpander has been used for solving real-life use cases including integration in an automated recruitment system and an issues and defects resolution system. A video demo of SetExpander is available at https://drive.google.com/open?id=1e545bB87Autsch36DjnJHmq3HWfSd1Rv (some images were blurred for privacy reasons).

* COLING 2018 System Demonstration paper

Via

Access Paper or Ask Questions