Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Théophane Vallaeys

Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks

Jan 06, 2025

Théophane Vallaeys, Matthew Muckley, Jakob Verbeek, Matthijs Douze

Figure 1 for Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks

Figure 2 for Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks

Figure 3 for Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks

Figure 4 for Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks

Abstract:Vector quantization is a fundamental technique for compression and large-scale nearest neighbor search. For high-accuracy operating points, multi-codebook quantization associates data vectors with one element from each of multiple codebooks. An example is residual quantization (RQ), which iteratively quantizes the residual error of previous steps. Dependencies between the different parts of the code are, however, ignored in RQ, which leads to suboptimal rate-distortion performance. QINCo recently addressed this inefficiency by using a neural network to determine the quantization codebook in RQ based on the vector reconstruction from previous steps. In this paper we introduce QINCo2 which extends and improves QINCo with (i) improved vector encoding using codeword pre-selection and beam-search, (ii) a fast approximate decoder leveraging codeword pairs to establish accurate short-lists for search, and (iii) an optimized training procedure and network architecture. We conduct experiments on four datasets to evaluate QINCo2 for vector compression and billion-scale nearest neighbor search. We obtain outstanding results in both settings, improving the state-of-the-art reconstruction MSE by 34% for 16-byte vector compression on BigANN, and search accuracy by 24% with 8-byte encodings on Deep1M.

Via

Access Paper or Ask Questions

Improved Baselines for Data-efficient Perceptual Augmentation of LLMs

Mar 20, 2024

Théophane Vallaeys, Mustafa Shukor, Matthieu Cord, Jakob Verbeek

Figure 1 for Improved Baselines for Data-efficient Perceptual Augmentation of LLMs

Figure 2 for Improved Baselines for Data-efficient Perceptual Augmentation of LLMs

Figure 3 for Improved Baselines for Data-efficient Perceptual Augmentation of LLMs

Figure 4 for Improved Baselines for Data-efficient Perceptual Augmentation of LLMs

Abstract:The abilities of large language models (LLMs) have recently progressed to unprecedented levels, paving the way to novel applications in a wide variety of areas. In computer vision, LLMs can be used to prime vision-language tasks such image captioning and visual question answering when coupled with pre-trained vision backbones. While different approaches have been explored to interface LLMs with ``perceptual backbones'' that process, e.g., visual or audio data, they are often explored for different tasks, different datasets, and using different perceptual backbones and language models, hindering direct comparison of the interfacing mechanisms. To remedy this lack of comparability between methods, we present an extensive experimental evaluation of different interfacing mechanisms, across multiple tasks (including image, video, and audio captioning as well as visual question answering), datasets and backbones, paying special attention to low-data settings. We find improved performance using existing mechanisms over state-of-the-art results, and identify a new interfacing mechanism that yields (near) optimal results across different tasks, while obtaining a 4x reduction in training time.

Via

Access Paper or Ask Questions