Picture for Matus Telgarsky

Matus Telgarsky

UCSD

One-layer transformers fail to solve the induction heads task

Add code
Aug 26, 2024
Viaarxiv icon

Spectrum Extraction and Clipping for Implicitly Linear Layers

Add code
Feb 25, 2024
Viaarxiv icon

Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency

Add code
Feb 24, 2024
Viaarxiv icon

Transformers, parallel computation, and logarithmic depth

Add code
Feb 14, 2024
Viaarxiv icon

On Achieving Optimal Adversarial Test Error

Add code
Jun 13, 2023
Viaarxiv icon

Representational Strengths and Limitations of Transformers

Add code
Jun 05, 2023
Viaarxiv icon

Feature selection with gradient descent on two-layer networks in low-rotation regimes

Add code
Aug 04, 2022
Figure 1 for Feature selection with gradient descent on two-layer networks in low-rotation regimes
Figure 2 for Feature selection with gradient descent on two-layer networks in low-rotation regimes
Figure 3 for Feature selection with gradient descent on two-layer networks in low-rotation regimes
Viaarxiv icon

Convex Analysis at Infinity: An Introduction to Astral Space

Add code
May 06, 2022
Viaarxiv icon

Stochastic linear optimization never overfits with quadratically-bounded losses on general data

Add code
Feb 14, 2022
Figure 1 for Stochastic linear optimization never overfits with quadratically-bounded losses on general data
Figure 2 for Stochastic linear optimization never overfits with quadratically-bounded losses on general data
Viaarxiv icon

Actor-critic is implicitly biased towards high entropy optimal policies

Add code
Oct 21, 2021
Viaarxiv icon