Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Georgii Novikov

Tensor-Train Point Cloud Compression and Efficient Approximate Nearest-Neighbor Search

Oct 06, 2024

Georgii Novikov, Alexander Gneushev, Alexey Kadeishvili, Ivan Oseledets

Abstract:Nearest-neighbor search in large vector databases is crucial for various machine learning applications. This paper introduces a novel method using tensor-train (TT) low-rank tensor decomposition to efficiently represent point clouds and enable fast approximate nearest-neighbor searches. We propose a probabilistic interpretation and utilize density estimation losses like Sliced Wasserstein to train TT decompositions, resulting in robust point cloud compression. We reveal an inherent hierarchical structure within TT point clouds, facilitating efficient approximate nearest-neighbor searches. In our paper, we provide detailed insights into the methodology and conduct comprehensive comparisons with existing methods. We demonstrate its effectiveness in various scenarios, including out-of-distribution (OOD) detection problems and approximate nearest-neighbor (ANN) search tasks.

Via

Access Paper or Ask Questions

Inverted Activations

Jul 22, 2024

Georgii Novikov, Ivan Oseledets

Abstract:The scaling of neural networks with increasing data and model sizes necessitates more efficient deep learning algorithms. This paper addresses the memory footprint challenge in neural network training by proposing a modification to the handling of activation tensors in pointwise nonlinearity layers. Traditionally, these layers save the entire input tensor for the backward pass, leading to substantial memory use. Our method involves saving the output tensor instead, reducing the memory required when the subsequent layer also saves its input tensor. This approach is particularly beneficial for transformer-based architectures like GPT, BERT, Mistral, and Llama. Application of our method involves taken an inverse function of nonlinearity. To the best of our knowledge, that can not be done analitically and instead we buid an accurate approximations using simpler functions. Experimental results confirm that our method significantly reduces memory usage without affecting training accuracy. The implementation is available at https://github.com/PgLoLo/optiacts.

Via

Access Paper or Ask Questions

Efficient GPT Model Pre-training using Tensor Train Matrix Representation

Jun 05, 2023

Viktoriia Chekalina, Georgii Novikov, Julia Gusak, Ivan Oseledets, Alexander Panchenko

Figure 1 for Efficient GPT Model Pre-training using Tensor Train Matrix Representation

Figure 2 for Efficient GPT Model Pre-training using Tensor Train Matrix Representation

Figure 3 for Efficient GPT Model Pre-training using Tensor Train Matrix Representation

Figure 4 for Efficient GPT Model Pre-training using Tensor Train Matrix Representation

Abstract:Large-scale transformer models have shown remarkable performance in language modelling tasks. However, such models feature billions of parameters, leading to difficulties in their deployment and prohibitive training costs from scratch. To reduce the number of the parameters in the GPT-2 architecture, we replace the matrices of fully-connected layers with the corresponding Tensor Train Matrix~(TTM) structure. Finally, we customize forward and backward operations through the TTM-based layer for simplicity and the stableness of further training. % The resulting GPT-2-based model stores up to 40% fewer parameters, showing the perplexity comparable to the original model. On the downstream tasks, including language understanding and text summarization, the model performs similarly to the original GPT-2 model. The proposed tensorized layers could be used to efficiently pre-training other Transformer models.

Via

Access Paper or Ask Questions

Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

Feb 02, 2022

Georgii Novikov, Daniel Bershatsky, Julia Gusak, Alex Shonenkov, Denis Dimitrov, Ivan Oseledets

Figure 1 for Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

Figure 2 for Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

Figure 3 for Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

Figure 4 for Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

Abstract:Memory footprint is one of the main limiting factors for large neural network training. In backpropagation, one needs to store the input to each operation in the computational graph. Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of the gradients. We propose a systematic approach to compute optimal quantization of the retained gradients of the pointwise nonlinear functions with only a few bits per each element. We show that such approximation can be achieved by computing optimal piecewise-constant approximation of the derivative of the activation function, which can be done by dynamic programming. The drop-in replacements are implemented for all popular nonlinearities and can be used in any existing pipeline. We confirm the memory reduction and the same convergence on several open benchmarks.

* Submitted

Via

Access Paper or Ask Questions