Picture for Alexander Gasnikov

Alexander Gasnikov

Accelerated zero-order SGD under high-order smoothness and overparameterized regime

Add code
Nov 21, 2024
Viaarxiv icon

Exploring Applications of State Space Models and Advanced Training Techniques in Sequential Recommendations: A Comparative Study on Efficiency and Performance

Add code
Aug 10, 2024
Viaarxiv icon

Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed

Add code
Jun 06, 2024
Figure 1 for Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed
Figure 2 for Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed
Figure 3 for Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed
Figure 4 for Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed
Viaarxiv icon

Local Methods with Adaptivity via Scaling

Add code
Jun 02, 2024
Figure 1 for Local Methods with Adaptivity via Scaling
Figure 2 for Local Methods with Adaptivity via Scaling
Viaarxiv icon

Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning

Add code
Apr 04, 2024
Figure 1 for Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning
Figure 2 for Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning
Figure 3 for Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning
Figure 4 for Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning
Viaarxiv icon

Optimal Flow Matching: Learning Straight Trajectories in Just One Step

Add code
Mar 19, 2024
Viaarxiv icon

AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size

Add code
Feb 07, 2024
Viaarxiv icon

Activations and Gradients Compression for Model-Parallel Training

Add code
Jan 15, 2024
Viaarxiv icon

Optimal Data Splitting in Distributed Optimization for Machine Learning

Add code
Jan 15, 2024
Viaarxiv icon

Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems

Add code
Nov 07, 2023
Viaarxiv icon