Picture for Daniel Soudry

Daniel Soudry

Tensor-Parallelism with Partially Synchronized Activations

Add code
Jun 24, 2025
Viaarxiv icon

When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets

Add code
Jun 23, 2025
Viaarxiv icon

Optimal Rates in Continual Linear Regression via Increasing Regularization

Add code
Jun 06, 2025
Viaarxiv icon

FP4 All the Way: Fully Quantized Training of LLMs

Add code
May 25, 2025
Viaarxiv icon

Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes

Add code
May 25, 2025
Viaarxiv icon

PLUMAGE: Probabilistic Low rank Unbiased Min Variance Gradient Estimator for Efficient Large Model Training

Add code
May 23, 2025
Viaarxiv icon

Better Rates for Random Task Orderings in Continual Linear Models

Add code
Apr 06, 2025
Viaarxiv icon

Provable Tempered Overfitting of Minimal Nets and Typical Nets

Add code
Oct 24, 2024
Viaarxiv icon

Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks

Add code
Oct 02, 2024
Figure 1 for Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
Figure 2 for Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
Figure 3 for Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
Figure 4 for Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
Viaarxiv icon

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Add code
Jun 10, 2024
Viaarxiv icon