Picture for Parameswaran Raman

Parameswaran Raman

Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent

Add code
Feb 03, 2026
Viaarxiv icon

Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs

Add code
Dec 18, 2025
Figure 1 for Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
Figure 2 for Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
Figure 3 for Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
Figure 4 for Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
Viaarxiv icon

HLAT: High-quality Large Language Model Pre-trained on AWS Trainium

Add code
Apr 16, 2024
Figure 1 for HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Figure 2 for HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Figure 3 for HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Figure 4 for HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Viaarxiv icon

EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

Add code
Apr 16, 2024
Figure 1 for EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
Figure 2 for EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
Figure 3 for EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
Figure 4 for EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
Viaarxiv icon

Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models

Add code
Apr 11, 2024
Figure 1 for Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
Figure 2 for Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
Figure 3 for Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
Figure 4 for Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
Viaarxiv icon

MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

Add code
Jan 17, 2024
Figure 1 for MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Figure 2 for MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Figure 3 for MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Figure 4 for MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Viaarxiv icon

Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate

Add code
Jan 05, 2024
Figure 1 for Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate
Figure 2 for Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate
Figure 3 for Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate
Viaarxiv icon

Contractive error feedback for gradient compression

Add code
Dec 13, 2023
Figure 1 for Contractive error feedback for gradient compression
Figure 2 for Contractive error feedback for gradient compression
Figure 3 for Contractive error feedback for gradient compression
Figure 4 for Contractive error feedback for gradient compression
Viaarxiv icon

DS-FACTO: Doubly Separable Factorization Machines

Add code
Apr 29, 2020
Figure 1 for DS-FACTO: Doubly Separable Factorization Machines
Figure 2 for DS-FACTO: Doubly Separable Factorization Machines
Figure 3 for DS-FACTO: Doubly Separable Factorization Machines
Figure 4 for DS-FACTO: Doubly Separable Factorization Machines
Viaarxiv icon

Optimization on the Surface of the -Sphere

Add code
Sep 13, 2019
Figure 1 for Optimization on the Surface of the -Sphere
Figure 2 for Optimization on the Surface of the -Sphere
Figure 3 for Optimization on the Surface of the -Sphere
Figure 4 for Optimization on the Surface of the -Sphere
Viaarxiv icon