Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aleksandr Katrutsa

NNTile: a machine learning framework capable of training extremely large GPT language models on a single node

Apr 17, 2025

Aleksandr Mikhalev, Aleksandr Katrutsa, Konstantin Sozykin, Ivan Oseledets

Abstract:This study presents an NNTile framework for training large deep neural networks in heterogeneous clusters. The NNTile is based on a StarPU library, which implements task-based parallelism and schedules all provided tasks onto all available processing units (CPUs and GPUs). It means that a particular operation, necessary to train a large neural network, can be performed on any of the CPU cores or GPU devices, depending on automatic scheduling decisions. Such an approach shifts the burden of deciding where to compute and when to communicate from a human being to an automatic decision maker, whether a simple greedy heuristic or a complex AI-based software. The performance of the presented tool for training large language models is demonstrated in extensive numerical experiments.

Via

Access Paper or Ask Questions

Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise

Feb 10, 2024

Yuriy Dorn, Aleksandr Katrutsa, Ilgam Latypov, Andrey Pudovikov

Figure 1 for Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise

Figure 2 for Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise

Figure 3 for Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise

Figure 4 for Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise

Abstract:In this study, we propose a new method for constructing UCB-type algorithms for stochastic multi-armed bandits based on general convex optimization methods with an inexact oracle. We derive the regret bounds corresponding to the convergence rates of the optimization methods. We propose a new algorithm Clipped-SGD-UCB and show, both theoretically and empirically, that in the case of symmetric noise in the reward, we can achieve an $O(\log T\sqrt{KT\log T})$ regret bound instead of $O\left (T^{\frac{1}{1+\alpha}} K^{\frac{\alpha}{1+\alpha}} \right)$ for the case when the reward distribution satisfies $\mathbb{E}_{X \in D}[|X|^{1+\alpha}] \leq \sigma^{1+\alpha}$ ($\alpha \in (0, 1])$, i.e. perform better than it is assumed by the general lower bound for bandits with heavy-tails. Moreover, the same bound holds even when the reward distribution does not have the expectation, that is, when $\alpha<0$.

Via

Access Paper or Ask Questions

Memory-efficient particle filter recurrent neural network for object localization

Oct 02, 2023

Roman Korkin, Ivan Oseledets, Aleksandr Katrutsa

Figure 1 for Memory-efficient particle filter recurrent neural network for object localization

Figure 2 for Memory-efficient particle filter recurrent neural network for object localization

Figure 3 for Memory-efficient particle filter recurrent neural network for object localization

Figure 4 for Memory-efficient particle filter recurrent neural network for object localization

Abstract:This study proposes a novel memory-efficient recurrent neural network (RNN) architecture specified to solve the object localization problem. This problem is to recover the object states along with its movement in a noisy environment. We take the idea of the classical particle filter and combine it with GRU RNN architecture. The key feature of the resulting memory-efficient particle filter RNN model (mePFRNN) is that it requires the same number of parameters to process environments of different sizes. Thus, the proposed mePFRNN architecture consumes less memory to store parameters compared to the previously proposed PFRNN model. To demonstrate the performance of our model, we test it on symmetric and noisy environments that are incredibly challenging for filtering algorithms. In our experiments, the mePFRNN model provides more precise localization than the considered competitors and requires fewer trained parameters.

Via

Access Paper or Ask Questions

Multiparticle Kalman filter for object localization in symmetric environments

Mar 14, 2023

Roman Korkin, Ivan Oseledets, Aleksandr Katrutsa

Abstract:This study considers the object localization problem and proposes a novel multiparticle Kalman filter to solve it in complex and symmetric environments. Two well-known classes of filtering algorithms to solve the localization problem are Kalman filter-based methods and particle filter-based methods. We consider these classes, demonstrate their complementary properties, and propose a novel filtering algorithm that takes the best from two classes. We evaluate the multiparticle Kalman filter in symmetric and noisy environments. Such environments are especially challenging for both classes of classical methods. We compare the proposed approach with the particle filter since only this method is feasible if the initial state is unknown. In the considered challenging environments, our method outperforms the particle filter in terms of both localization error and runtime.

Via

Access Paper or Ask Questions

NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizers

Sep 29, 2022

Valentin Leplat, Daniil Merkulov, Aleksandr Katrutsa, Daniel Bershatsky, Ivan Oseledets

Figure 1 for NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizers

Figure 2 for NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizers

Figure 3 for NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizers

Figure 4 for NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizers

Abstract:Classical machine learning models such as deep neural networks are usually trained by using Stochastic Gradient Descent-based (SGD) algorithms. The classical SGD can be interpreted as a discretization of the stochastic gradient flow. In this paper we propose a novel, robust and accelerated stochastic optimizer that relies on two key elements: (1) an accelerated Nesterov-like Stochastic Differential Equation (SDE) and (2) its semi-implicit Gauss-Seidel type discretization. The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively in the case of the minimization of a quadratic function. This analysis allows us to come up with an optimal step size (or learning rate) in terms of rate of convergence while ensuring the stability of NAG-GS. This is achieved by the careful analysis of the spectral radius of the iteration matrix and the covariance matrix at stationarity with respect to all hyperparameters of our method. We show that NAG-GS is competitive with state-of-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models such as the logistic regression model, the residual networks models on standard computer vision datasets, and Transformers in the frame of the GLUE benchmark.

* We study Nesterov acceleration for the Stochastic Differential Equation

Via

Access Paper or Ask Questions

Extension of Dynamic Mode Decomposition for dynamic systems with incomplete information based on t-model of optimal prediction

Feb 23, 2022

Aleksandr Katrutsa, Sergey Utyuzhnikov, Ivan Oseledets

Figure 1 for Extension of Dynamic Mode Decomposition for dynamic systems with incomplete information based on t-model of optimal prediction

Figure 2 for Extension of Dynamic Mode Decomposition for dynamic systems with incomplete information based on t-model of optimal prediction

Figure 3 for Extension of Dynamic Mode Decomposition for dynamic systems with incomplete information based on t-model of optimal prediction

Abstract:The Dynamic Mode Decomposition has proved to be a very efficient technique to study dynamic data. This is entirely a data-driven approach that extracts all necessary information from data snapshots which are commonly supposed to be sampled from measurement. The application of this approach becomes problematic if the available data is incomplete because some dimensions of smaller scale either missing or unmeasured. Such setting occurs very often in modeling complex dynamical systems such as power grids, in particular with reduced-order modeling. To take into account the effect of unresolved variables the optimal prediction approach based on the Mori-Zwanzig formalism can be applied to obtain the most expected prediction under existing uncertainties. This effectively leads to the development of a time-predictive model accounting for the impact of missing data. In the present paper we provide a detailed derivation of the considered method from the Liouville equation and finalize it with the optimization problem that defines the optimal transition operator corresponding to the observed data. In contrast to the existing approach, we consider a first-order approximation of the Mori-Zwanzig decomposition, state the corresponding optimization problem and solve it with the gradient-based optimization method. The gradient of the obtained objective function is computed precisely through the automatic differentiation technique. The numerical experiments illustrate that the considered approach gives practically the same dynamics as the exact Mori-Zwanzig decomposition, but is less computationally intensive.

Via

Access Paper or Ask Questions