Picture for Mher Safaryan

Mher Safaryan

LoRDO: Distributed Low-Rank Optimization with Infrequent Communication

Add code
Feb 04, 2026
Viaarxiv icon

DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers

Add code
Feb 02, 2026
Viaarxiv icon

DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models

Add code
May 28, 2025
Viaarxiv icon

SVD-Free Low-Rank Adaptive Gradient Optimization for Large Language Models

Add code
May 23, 2025
Viaarxiv icon

LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics

Add code
Oct 21, 2024
Viaarxiv icon

The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information

Add code
Aug 30, 2024
Viaarxiv icon

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

Add code
May 24, 2024
Viaarxiv icon

AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms

Add code
Oct 31, 2023
Viaarxiv icon

Knowledge Distillation Performs Partial Variance Reduction

Add code
May 27, 2023
Viaarxiv icon

GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity

Add code
Oct 28, 2022
Viaarxiv icon