Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stanislav Budzinskiy

Numerical Error Analysis of Large Language Models

Mar 13, 2025

Stanislav Budzinskiy, Wenyi Fang, Longbin Zeng, Philipp Petersen

Abstract:Large language models based on transformer architectures have become integral to state-of-the-art natural language processing applications. However, their training remains computationally expensive and exhibits instabilities, some of which are expected to be caused by finite-precision computations. We provide a theoretical analysis of the impact of round-off errors within the forward pass of a transformer architecture which yields fundamental bounds for these effects. In addition, we conduct a series of numerical experiments which demonstrate the practical relevance of our bounds. Our results yield concrete guidelines for choosing hyperparameters that mitigate round-off errors, leading to more robust and stable inference.

Via

Access Paper or Ask Questions

When big data actually are low-rank, or entrywise approximation of certain function-generated matrices

Jul 03, 2024

Stanislav Budzinskiy

Abstract:The article concerns low-rank approximation of matrices generated by sampling a smooth function of two $m$-dimensional variables. We refute an argument made in the literature that, for a specific class of analytic functions, such matrices admit accurate entrywise approximation of rank that is independent of $m$. We provide a theoretical explanation of the numerical results presented in support of this argument, describing three narrower classes of functions for which $n \times n$ function-generated matrices can be approximated within an entrywise error of order $\varepsilon$ with rank $\mathcal{O}(\log(n) \varepsilon^{-2} \mathrm{polylog}(\varepsilon^{-1}))$ that is independent of the dimension $m$: (i) functions of the inner product of the two variables, (ii) functions of the squared Euclidean distance between the variables, and (iii) shift-invariant positive-definite kernels. We extend our argument to low-rank tensor-train approximation of tensors generated with functions of the multi-linear product of their $m$-dimensional variables. We discuss our results in the context of low-rank approximation of attention in transformer neural networks.

Via

Access Paper or Ask Questions

Variational Bayesian inference for CP tensor completion with side information

Jun 29, 2022

Stanislav Budzinskiy, Nikolai Zamarashkin

Figure 1 for Variational Bayesian inference for CP tensor completion with side information

Figure 2 for Variational Bayesian inference for CP tensor completion with side information

Figure 3 for Variational Bayesian inference for CP tensor completion with side information

Figure 4 for Variational Bayesian inference for CP tensor completion with side information

Abstract:We propose a message passing algorithm, based on variational Bayesian inference, for low-rank tensor completion with automatic rank determination in the canonical polyadic format when additional side information (SI) is given. The SI comes in the form of low-dimensional subspaces the contain the fiber spans of the tensor (columns, rows, tubes, etc.). We validate the regularization properties induced by SI with extensive numerical experiments on synthetic and real-world data and present the results about tensor recovery and rank determination. The results show that the number of samples required for successful completion is significantly reduced in the presence of SI. We also discuss the origin of a bump in the phase transition curves that exists when the dimensionality of SI is comparable with that of the tensor.

* added 1 citation

Via

Access Paper or Ask Questions

Tensor train completion: local recovery guarantees via Riemannian optimization

Oct 08, 2021

Stanislav Budzinskiy, Nikolai Zamarashkin

Figure 1 for Tensor train completion: local recovery guarantees via Riemannian optimization

Figure 2 for Tensor train completion: local recovery guarantees via Riemannian optimization

Abstract:In this work we estimate the number of randomly selected elements of a tensor that with high probability guarantees local convergence of Riemannian gradient descent for tensor train completion. We derive a new bound for the orthogonal projections onto the tangent spaces based on the harmonic mean of the unfoldings' singular values and introduce a notion of core coherence for tensor trains. We also extend the results to tensor train completion with side information and obtain the corresponding local convergence guarantees.

Via

Access Paper or Ask Questions