Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation

Nov 15, 2024

Tim Elsner, Paula Usinger, Julius Nehring-Wirxel, Gregor Kobsik, Victor Czech, Yanjiang He, Isaak Lim, Leif Kobbelt

Figure 1 for Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation

Figure 2 for Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation

Figure 3 for Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation

Figure 4 for Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation

Share this with someone who'll enjoy it:

Abstract:In language processing, transformers benefit greatly from text being condensed. This is achieved through a larger vocabulary that captures word fragments instead of plain characters. This is often done with Byte Pair Encoding. In the context of images, tokenisation of visual data is usually limited to regular grids obtained from quantisation methods, without global content awareness. Our work improves tokenisation of visual data by bringing Byte Pair Encoding from 1D to multiple dimensions, as a complementary add-on to existing compression. We achieve this through counting constellations of token pairs and replacing the most frequent token pair with a newly introduced token. The multidimensionality only increases the computation time by a factor of 2 for images, making it applicable even to large datasets like ImageNet within minutes on consumer hardware. This is a lossless preprocessing step. Our evaluation shows improved training and inference performance of transformers on visual data achieved by compressing frequent constellations of tokens: The resulting sequences are shorter, with more uniformly distributed information content, e.g. condensing empty regions in an image into single tokens. As our experiments show, these condensed sequences are easier to process. We additionally introduce a strategy to amplify this compression further by clustering the vocabulary.

View paper on

Share this with someone who'll enjoy it:

Title:Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation

Paper and Code