Picture for Daniel Soudry

Daniel Soudry

Workspace Optimization: How to Train Your Agent

Add code
May 10, 2026
Viaarxiv icon

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

Add code
May 07, 2026
Viaarxiv icon

Normalized Architectures are Natively 4-Bit

Add code
May 07, 2026
Viaarxiv icon

Optimal L2 Regularization in High-dimensional Continual Linear Regression

Add code
Jan 20, 2026
Viaarxiv icon

Alias-Free ViT: Fractional Shift Invariance via Linear Attention

Add code
Oct 26, 2025
Viaarxiv icon

Tensor-Parallelism with Partially Synchronized Activations

Add code
Jun 24, 2025
Viaarxiv icon

When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets

Add code
Jun 23, 2025
Viaarxiv icon

Optimal Rates in Continual Linear Regression via Increasing Regularization

Add code
Jun 06, 2025
Viaarxiv icon

FP4 All the Way: Fully Quantized Training of LLMs

Add code
May 25, 2025
Figure 1 for FP4 All the Way: Fully Quantized Training of LLMs
Figure 2 for FP4 All the Way: Fully Quantized Training of LLMs
Figure 3 for FP4 All the Way: Fully Quantized Training of LLMs
Figure 4 for FP4 All the Way: Fully Quantized Training of LLMs
Viaarxiv icon

Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes

Add code
May 25, 2025
Figure 1 for Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
Figure 2 for Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
Figure 3 for Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
Figure 4 for Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
Viaarxiv icon