Picture for Maksim Velikanov

Maksim Velikanov

Falcon Mamba: The First Competitive Attention-free 7B Language Model

Add code
Oct 07, 2024
Viaarxiv icon

SGD with memory: fundamental properties and stochastic acceleration

Add code
Oct 05, 2024
Figure 1 for SGD with memory: fundamental properties and stochastic acceleration
Figure 2 for SGD with memory: fundamental properties and stochastic acceleration
Figure 3 for SGD with memory: fundamental properties and stochastic acceleration
Figure 4 for SGD with memory: fundamental properties and stochastic acceleration
Viaarxiv icon

Falcon2-11B Technical Report

Add code
Jul 20, 2024
Viaarxiv icon

Generalization error of spectral algorithms

Add code
Mar 18, 2024
Viaarxiv icon

Efficient Conformal Prediction under Data Heterogeneity

Add code
Dec 25, 2023
Viaarxiv icon

Comparing the robustness of modern no-reference image- and video-quality metrics to adversarial attacks

Add code
Oct 10, 2023
Viaarxiv icon

A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta

Add code
Jun 22, 2022
Figure 1 for A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta
Figure 2 for A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta
Figure 3 for A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta
Figure 4 for A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta
Viaarxiv icon

Embedded Ensembles: Infinite Width Limit and Operating Regimes

Add code
Feb 24, 2022
Figure 1 for Embedded Ensembles: Infinite Width Limit and Operating Regimes
Figure 2 for Embedded Ensembles: Infinite Width Limit and Operating Regimes
Figure 3 for Embedded Ensembles: Infinite Width Limit and Operating Regimes
Figure 4 for Embedded Ensembles: Infinite Width Limit and Operating Regimes
Viaarxiv icon

Tight Convergence Rate Bounds for Optimization Under Power Law Spectral Conditions

Add code
Feb 02, 2022
Viaarxiv icon

Universal scaling laws in the gradient descent training of neural networks

Add code
May 02, 2021
Figure 1 for Universal scaling laws in the gradient descent training of neural networks
Figure 2 for Universal scaling laws in the gradient descent training of neural networks
Figure 3 for Universal scaling laws in the gradient descent training of neural networks
Figure 4 for Universal scaling laws in the gradient descent training of neural networks
Viaarxiv icon