Picture for Zhi-Qin John Xu

Zhi-Qin John Xu

A rationale from frequency perspective for grokking in training neural network

Add code
May 24, 2024
Viaarxiv icon

Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation

Add code
May 24, 2024
Figure 1 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 2 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 3 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 4 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Viaarxiv icon

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Add code
May 08, 2024
Viaarxiv icon

Loss Jump During Loss Switch in Solving PDEs with Neural Networks

Add code
May 06, 2024
Viaarxiv icon

Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation

Add code
May 02, 2024
Viaarxiv icon

Understanding Time Series Anomaly State Detection through One-Class Classification

Add code
Feb 03, 2024
Viaarxiv icon

Anchor function: a type of benchmark functions for studying language models

Add code
Jan 16, 2024
Viaarxiv icon

An Unsupervised Deep Learning Approach for the Wave Equation Inverse Problem

Add code
Nov 08, 2023
Viaarxiv icon

Optimistic Estimate Uncovers the Potential of Nonlinear Models

Add code
Jul 18, 2023
Viaarxiv icon

Stochastic Modified Equations and Dynamics of Dropout Algorithm

Add code
May 25, 2023
Viaarxiv icon