Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Core Tokensets for Data-efficient Sequential Training of Transformers

Oct 08, 2024

Subarnaduti Paul, Manuel Brack, Patrick Schramowski, Kristian Kersting, Martin Mundt

Figure 1 for Core Tokensets for Data-efficient Sequential Training of Transformers

Figure 2 for Core Tokensets for Data-efficient Sequential Training of Transformers

Figure 3 for Core Tokensets for Data-efficient Sequential Training of Transformers

Figure 4 for Core Tokensets for Data-efficient Sequential Training of Transformers

Share this with someone who'll enjoy it:

Abstract:Deep networks are frequently tuned to novel tasks and continue learning from ongoing data streams. Such sequential training requires consolidation of new and past information, a challenge predominantly addressed by retaining the most important data points - formally known as coresets. Traditionally, these coresets consist of entire samples, such as images or sentences. However, recent transformer architectures operate on tokens, leading to the famous assertion that an image is worth 16x16 words. Intuitively, not all of these tokens are equally informative or memorable. Going beyond coresets, we thus propose to construct a deeper-level data summary on the level of tokens. Our respectively named core tokensets both select the most informative data points and leverage feature attribution to store only their most relevant features. We demonstrate that core tokensets yield significant performance retention in incremental image classification, open-ended visual question answering, and continual image captioning with significantly reduced memory. In fact, we empirically find that a core tokenset of 1\% of the data performs comparably to at least a twice as large and up to 10 times larger coreset.

View paper on

Share this with someone who'll enjoy it:

Title:Core Tokensets for Data-efficient Sequential Training of Transformers

Paper and Code