Picture for Zhangchen Zhou

Zhangchen Zhou

A rationale from frequency perspective for grokking in training neural network

Add code
May 24, 2024
Viaarxiv icon

Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation

Add code
May 24, 2024
Figure 1 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 2 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 3 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 4 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Viaarxiv icon

Anchor function: a type of benchmark functions for studying language models

Add code
Jan 16, 2024
Viaarxiv icon

Understanding the Initial Condensation of Convolutional Neural Networks

Add code
May 17, 2023
Viaarxiv icon

Phase Diagram of Initial Condensation for Two-layer Neural Networks

Add code
Mar 12, 2023
Figure 1 for Phase Diagram of Initial Condensation for Two-layer Neural Networks
Figure 2 for Phase Diagram of Initial Condensation for Two-layer Neural Networks
Figure 3 for Phase Diagram of Initial Condensation for Two-layer Neural Networks
Figure 4 for Phase Diagram of Initial Condensation for Two-layer Neural Networks
Viaarxiv icon