Picture for Aleksandr Beznosikov

Aleksandr Beznosikov

FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

Add code
Nov 12, 2024
Viaarxiv icon

Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning

Add code
Sep 22, 2024
Viaarxiv icon

Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed

Add code
Jun 06, 2024
Figure 1 for Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed
Figure 2 for Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed
Figure 3 for Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed
Figure 4 for Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed
Viaarxiv icon

Local Methods with Adaptivity via Scaling

Add code
Jun 02, 2024
Figure 1 for Local Methods with Adaptivity via Scaling
Figure 2 for Local Methods with Adaptivity via Scaling
Viaarxiv icon

Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning

Add code
Apr 04, 2024
Figure 1 for Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning
Figure 2 for Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning
Figure 3 for Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning
Figure 4 for Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning
Viaarxiv icon

Optimal Data Splitting in Distributed Optimization for Machine Learning

Add code
Jan 15, 2024
Viaarxiv icon

Activations and Gradients Compression for Model-Parallel Training

Add code
Jan 15, 2024
Viaarxiv icon

Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting

Add code
Oct 09, 2023
Viaarxiv icon

First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities

Add code
May 25, 2023
Viaarxiv icon

Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features

Add code
Apr 23, 2023
Viaarxiv icon