Picture for Difan Zou

Difan Zou

Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis

Add code
Feb 17, 2025
Viaarxiv icon

Hyperspherical Energy Transformer with Recurrent Depth

Add code
Feb 17, 2025
Viaarxiv icon

Can Diffusion Models Learn Hidden Inter-Feature Rules Behind Images?

Add code
Feb 07, 2025
Viaarxiv icon

Masked Autoencoders Are Effective Tokenizers for Diffusion Models

Add code
Feb 05, 2025
Viaarxiv icon

SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution

Add code
Jan 09, 2025
Figure 1 for SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution
Figure 2 for SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution
Figure 3 for SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution
Figure 4 for SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution
Viaarxiv icon

Parallelized Autoregressive Visual Generation

Add code
Dec 19, 2024
Figure 1 for Parallelized Autoregressive Visual Generation
Figure 2 for Parallelized Autoregressive Visual Generation
Figure 3 for Parallelized Autoregressive Visual Generation
Figure 4 for Parallelized Autoregressive Visual Generation
Viaarxiv icon

On the Feature Learning in Diffusion Models

Add code
Dec 02, 2024
Viaarxiv icon

Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension Ability

Add code
Nov 29, 2024
Viaarxiv icon

An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models

Add code
Nov 26, 2024
Viaarxiv icon

How Does Critical Batch Size Scale in Pre-training?

Add code
Oct 29, 2024
Figure 1 for How Does Critical Batch Size Scale in Pre-training?
Figure 2 for How Does Critical Batch Size Scale in Pre-training?
Figure 3 for How Does Critical Batch Size Scale in Pre-training?
Figure 4 for How Does Critical Batch Size Scale in Pre-training?
Viaarxiv icon