Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shichen Dong

QAQ: Quality Adaptive Quantization for LLM KV Cache

Mar 07, 2024

Shichen Dong, Wen Cheng, Jiayu Qin, Wei Wang

Figure 1 for QAQ: Quality Adaptive Quantization for LLM KV Cache

Figure 2 for QAQ: Quality Adaptive Quantization for LLM KV Cache

Figure 3 for QAQ: Quality Adaptive Quantization for LLM KV Cache

Figure 4 for QAQ: Quality Adaptive Quantization for LLM KV Cache

Abstract:The emergence of LLMs has ignited a fresh surge of breakthroughs in NLP applications, particularly in domains such as question-answering systems and text generation. As the need for longer context grows, a significant bottleneck in model deployment emerges due to the linear expansion of the Key-Value (KV) cache with the context length. Existing methods primarily rely on various hypotheses, such as sorting the KV cache based on attention scores for replacement or eviction, to compress the KV cache and improve model throughput. However, heuristics used by these strategies may wrongly evict essential KV cache, which can significantly degrade model performance. In this paper, we propose QAQ, a Quality Adaptive Quantization scheme for the KV cache. We theoretically demonstrate that key cache and value cache exhibit distinct sensitivities to quantization, leading to the formulation of separate quantization strategies for their non-uniform quantization. Through the integration of dedicated outlier handling, as well as an improved attention-aware approach, QAQ achieves up to 10x the compression ratio of the KV cache size with a neglectable impact on model performance. QAQ significantly reduces the practical hurdles of deploying LLMs, opening up new possibilities for longer-context applications. The code is available at github.com/ClubieDong/KVCacheQuantization.

Via

Access Paper or Ask Questions

W2KPE: Keyphrase Extraction with Word-Word Relation

Mar 22, 2023

Wen Cheng, Shichen Dong, Wei Wang

Figure 1 for W2KPE: Keyphrase Extraction with Word-Word Relation

Figure 2 for W2KPE: Keyphrase Extraction with Word-Word Relation

Abstract:This paper describes our submission to ICASSP 2023 MUG Challenge Track 4, Keyphrase Extraction, which aims to extract keyphrases most relevant to the conference theme from conference materials. We model the challenge as a single-class Named Entity Recognition task and developed techniques for better performance on the challenge: For the data preprocessing, we encode the split keyphrases after word segmentation. In addition, we increase the amount of input information that the model can accept at one time by fusing multiple preprocessed sentences into one segment. We replace the loss function with the multi-class focal loss to address the sparseness of keyphrases. Besides, we score each appearance of keyphrases and add an extra output layer to fit the score to rank keyphrases. Exhaustive evaluations are performed to find the best combination of the word segmentation tool, the pre-trained embedding model, and the corresponding hyperparameters. With these proposals, we scored 45.04 on the final test set.

Via

Access Paper or Ask Questions