Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinsol Park

CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding

Aug 08, 2024

Sophia Ho, Jinsol Park, Patrick Wang

Figure 1 for CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding

Figure 2 for CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding

Figure 3 for CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding

Figure 4 for CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding

Abstract:We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign of REST that allows it to be effectively "compacted". REST is a drafting technique for speculative decoding based on retrieving exact n-gram matches of the most recent n tokens generated by the target LLM from a datastore. The key idea of CREST is to only store a subset of the smallest and most common n-grams in the datastore with the hope of achieving comparable performance with less storage space. We found that storing a subset of n-grams both reduces storage space and improves performance. CREST matches REST's accepted token length with 10.6-13.5x less storage space and achieves a 16.5-17.1% higher acceptance length than REST using the same storage space on the HumanEval and MT Bench benchmarks.

Via

Access Paper or Ask Questions

PreWoMe: Exploiting Presuppositions as Working Memory for Long Form Question Answering

Oct 24, 2023

Wookje Han, Jinsol Park, Kyungjae Lee

Abstract:Information-seeking questions in long-form question answering (LFQA) often prove misleading due to ambiguity or false presupposition in the question. While many existing approaches handle misleading questions, they are tailored to limited questions, which are insufficient in a real-world setting with unpredictable input characteristics. In this work, we propose PreWoMe, a unified approach capable of handling any type of information-seeking question. The key idea of PreWoMe involves extracting presuppositions in the question and exploiting them as working memory to generate feedback and action about the question. Our experiment shows that PreWoMe is effective not only in tackling misleading questions but also in handling normal ones, thereby demonstrating the effectiveness of leveraging presuppositions, feedback, and action for real-world QA settings.

* 11 pages 3 figures, Accepted to EMNLP 2023 (short)

Via

Access Paper or Ask Questions