Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Vision-centric Token Compression in Large Language Model

Feb 04, 2025

Ling Xing, Alex Jinpeng Wang, Rui Yan, Jinhui Tang

Figure 1 for Vision-centric Token Compression in Large Language Model

Figure 2 for Vision-centric Token Compression in Large Language Model

Figure 3 for Vision-centric Token Compression in Large Language Model

Figure 4 for Vision-centric Token Compression in Large Language Model

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) have revolutionized natural language processing, excelling in handling longer sequences. However, the inefficiency and redundancy in processing extended in-context tokens remain a challenge. Many attempts to address this rely on compressing tokens with smaller text encoders, yet we question whether text encoders are truly indispensable. Our journey leads to an unexpected discovery-a much smaller vision encoder, applied directly to sequences of text tokens, can rival text encoders on text tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small text understanding benchmarks, VIST leads to comparable results with 16% fewer FLOPs and 50% less memory usage. We further uncover significant token redundancy and devise a frequency-based masking strategy to guide the focus of the visual encoder toward the most critical tokens. Interestingly, we observe the trained visual encoder performs like a summarizer, selectively ignoring less important words such as prepositions and conjunctions. This approach delivers remarkable results, outperforming traditional text encoder-based methods by 5.7% on average over benchmarks like TriviaQA, NQ, PopQA, TREF, SST2, and SST5, setting a new standard for token efficiency in LLMs.

View paper on

Share this with someone who'll enjoy it:

Title:Vision-centric Token Compression in Large Language Model

Paper and Code