Picture for Zhi-Qin John Xu

Zhi-Qin John Xu

An Analysis for Reasoning Bias of Language Models with Small Initialization

Add code
Feb 05, 2025
Viaarxiv icon

Reasoning Bias of Next Token Prediction Training

Add code
Feb 04, 2025
Viaarxiv icon

Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers

Add code
Jan 15, 2025
Figure 1 for Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers
Figure 2 for Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers
Figure 3 for Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers
Figure 4 for Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers
Viaarxiv icon

Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation

Add code
May 24, 2024
Figure 1 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 2 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 3 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 4 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Viaarxiv icon

A rationale from frequency perspective for grokking in training neural network

Add code
May 24, 2024
Figure 1 for A rationale from frequency perspective for grokking in training neural network
Figure 2 for A rationale from frequency perspective for grokking in training neural network
Figure 3 for A rationale from frequency perspective for grokking in training neural network
Figure 4 for A rationale from frequency perspective for grokking in training neural network
Viaarxiv icon

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Add code
May 08, 2024
Viaarxiv icon

Loss Jump During Loss Switch in Solving PDEs with Neural Networks

Add code
May 06, 2024
Figure 1 for Loss Jump During Loss Switch in Solving PDEs with Neural Networks
Figure 2 for Loss Jump During Loss Switch in Solving PDEs with Neural Networks
Figure 3 for Loss Jump During Loss Switch in Solving PDEs with Neural Networks
Figure 4 for Loss Jump During Loss Switch in Solving PDEs with Neural Networks
Viaarxiv icon

Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation

Add code
May 02, 2024
Viaarxiv icon

Understanding Time Series Anomaly State Detection through One-Class Classification

Add code
Feb 03, 2024
Figure 1 for Understanding Time Series Anomaly State Detection through One-Class Classification
Figure 2 for Understanding Time Series Anomaly State Detection through One-Class Classification
Figure 3 for Understanding Time Series Anomaly State Detection through One-Class Classification
Figure 4 for Understanding Time Series Anomaly State Detection through One-Class Classification
Viaarxiv icon

Anchor function: a type of benchmark functions for studying language models

Add code
Jan 16, 2024
Viaarxiv icon