Picture for Nikolay Yudin

Nikolay Yudin

Mitigating Position-Shift Failures in Text-Based Modular Arithmetic via Position Curriculum and Template Diversity

Add code
Jan 07, 2026
Viaarxiv icon

DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning

Add code
Nov 09, 2025
Figure 1 for DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning
Figure 2 for DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning
Figure 3 for DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning
Figure 4 for DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning
Viaarxiv icon

Pay Attention to Attention Distribution: A New Local Lipschitz Bound for Transformers

Add code
Jul 10, 2025
Viaarxiv icon

Group and Shuffle: Efficient Structured Orthogonal Parametrization

Add code
Jun 14, 2024
Figure 1 for Group and Shuffle: Efficient Structured Orthogonal Parametrization
Figure 2 for Group and Shuffle: Efficient Structured Orthogonal Parametrization
Figure 3 for Group and Shuffle: Efficient Structured Orthogonal Parametrization
Figure 4 for Group and Shuffle: Efficient Structured Orthogonal Parametrization
Viaarxiv icon