Picture for Anna Choromanska

Anna Choromanska

Understanding Quantization of Optimizer States in LLM Pre-training: Dynamics of State Staleness and Effectiveness of State Resets

Add code
Mar 17, 2026
Viaarxiv icon

Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

Add code
Mar 12, 2026
Viaarxiv icon

Self-Supervised JEPA-based World Models for LiDAR Occupancy Completion and Forecasting

Add code
Feb 13, 2026
Viaarxiv icon

Streamlining Industrial Contract Management with Retrieval-Augmented LLMs

Add code
Nov 18, 2025
Viaarxiv icon

Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization

Add code
Oct 06, 2025
Viaarxiv icon

A Survey of Optimization Methods for Training DL Models: Theoretical Perspective on Convergence and Generalization

Add code
Jan 24, 2025
Figure 1 for A Survey of Optimization Methods for Training DL Models: Theoretical Perspective on Convergence and Generalization
Figure 2 for A Survey of Optimization Methods for Training DL Models: Theoretical Perspective on Convergence and Generalization
Figure 3 for A Survey of Optimization Methods for Training DL Models: Theoretical Perspective on Convergence and Generalization
Figure 4 for A Survey of Optimization Methods for Training DL Models: Theoretical Perspective on Convergence and Generalization
Viaarxiv icon

AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data

Add code
Jan 09, 2025
Figure 1 for AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data
Figure 2 for AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data
Figure 3 for AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data
Figure 4 for AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data
Viaarxiv icon

Adjacent Leader Decentralized Stochastic Gradient Descent

Add code
May 18, 2024
Figure 1 for Adjacent Leader Decentralized Stochastic Gradient Descent
Figure 2 for Adjacent Leader Decentralized Stochastic Gradient Descent
Figure 3 for Adjacent Leader Decentralized Stochastic Gradient Descent
Figure 4 for Adjacent Leader Decentralized Stochastic Gradient Descent
Viaarxiv icon

GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models

Add code
Mar 07, 2024
Figure 1 for GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models
Figure 2 for GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models
Figure 3 for GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models
Figure 4 for GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models
Viaarxiv icon

TAME: Task Agnostic Continual Learning using Multiple Experts

Add code
Oct 08, 2022
Figure 1 for TAME: Task Agnostic Continual Learning using Multiple Experts
Figure 2 for TAME: Task Agnostic Continual Learning using Multiple Experts
Figure 3 for TAME: Task Agnostic Continual Learning using Multiple Experts
Figure 4 for TAME: Task Agnostic Continual Learning using Multiple Experts
Viaarxiv icon