Picture for Zhanpeng Zhou

Zhanpeng Zhou

On the Cone Effect in the Learning Dynamics

Add code
Mar 20, 2025
Viaarxiv icon

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training

Add code
Feb 26, 2025
Viaarxiv icon

Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training

Add code
Oct 14, 2024
Figure 1 for Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Figure 2 for Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Figure 3 for Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Figure 4 for Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Viaarxiv icon

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

Add code
Oct 07, 2024
Figure 1 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 2 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 3 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 4 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Viaarxiv icon

Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm

Add code
Feb 06, 2024
Figure 1 for Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm
Figure 2 for Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm
Figure 3 for Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm
Figure 4 for Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm
Viaarxiv icon

Going Beyond Neural Network Feature Similarity: The Network Feature Complexity and Its Interpretation Using Category Theory

Add code
Oct 10, 2023
Viaarxiv icon

Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity

Add code
Jul 17, 2023
Viaarxiv icon

Defects of Convolutional Decoder Networks in Frequency Representation

Add code
Oct 17, 2022
Figure 1 for Defects of Convolutional Decoder Networks in Frequency Representation
Figure 2 for Defects of Convolutional Decoder Networks in Frequency Representation
Figure 3 for Defects of Convolutional Decoder Networks in Frequency Representation
Figure 4 for Defects of Convolutional Decoder Networks in Frequency Representation
Viaarxiv icon

Batch Normalization Is Blind to the First and Second Derivatives of the Loss

Add code
Jun 02, 2022
Figure 1 for Batch Normalization Is Blind to the First and Second Derivatives of the Loss
Figure 2 for Batch Normalization Is Blind to the First and Second Derivatives of the Loss
Figure 3 for Batch Normalization Is Blind to the First and Second Derivatives of the Loss
Figure 4 for Batch Normalization Is Blind to the First and Second Derivatives of the Loss
Viaarxiv icon

A Unified Game-Theoretic Interpretation of Adversarial Robustness

Add code
Nov 08, 2021
Figure 1 for A Unified Game-Theoretic Interpretation of Adversarial Robustness
Figure 2 for A Unified Game-Theoretic Interpretation of Adversarial Robustness
Figure 3 for A Unified Game-Theoretic Interpretation of Adversarial Robustness
Figure 4 for A Unified Game-Theoretic Interpretation of Adversarial Robustness
Viaarxiv icon