Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Runsong Zhao

Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models

Oct 07, 2024

Xinyu Liu, Runsong Zhao, Pengcheng Huang, Chunyang Xiao, Bei Li, Jingang Wang, Tong Xiao, Jingbo Zhu

Figure 1 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models

Figure 2 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models

Figure 3 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models

Figure 4 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models

Abstract:Numerous recent works target to extend effective context length for language models and various methods, tasks and benchmarks exist to measure model's effective memorization length. However, through thorough investigations, we find limitations for currently existing evaluations on model's memorization capability. We provide an extensive survey for limitations in this work and propose a new method called forgetting curve to measure the memorization capability of long-context models. We show that forgetting curve has the advantage of being robust to the tested corpus and the experimental settings, of not relying on prompts and can be applied to any model size. We apply our forgetting curve to a large variety of models involving both transformer and RNN/SSM based architectures. Our measurement provides empirical evidence for the effectiveness of transformer extension techniques while raises questions for the effective length of RNN/SSM based models. We also examine the difference between our measurement and existing benchmarks as well as popular metrics for various models. Our code and results can be found at https://github.com/1azybug/ForgettingCurve.

Via

Access Paper or Ask Questions

More Effective LLM Compressed Tokens with Uniformly Spread Position Identifiers and Compression Loss

Sep 22, 2024

Runsong Zhao, Pengcheng Huang, Xinyu Liu, Chunyang Xiao, Tong Xiao, Jingbo Zhu

Abstract:Compressing Transformer inputs into compressd tokens allows running LLMs with improved speed and cost efficiency. Based on the compression method ICAE, we carefully examine the position identifier choices for compressed tokens and also propose a new compression loss. We demonstrate empirically that our proposed methods achieve significantly higher compression ratios (15x compared to 4x for ICAE), while being able to attain comparable reconstruction performance.

Via

Access Paper or Ask Questions