Picture for Hanlin Zhang

Hanlin Zhang

Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants

Add code
Feb 04, 2025
Viaarxiv icon

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Add code
Dec 03, 2024
Viaarxiv icon

How Does Critical Batch Size Scale in Pre-training?

Add code
Oct 29, 2024
Figure 1 for How Does Critical Batch Size Scale in Pre-training?
Figure 2 for How Does Critical Batch Size Scale in Pre-training?
Figure 3 for How Does Critical Batch Size Scale in Pre-training?
Figure 4 for How Does Critical Batch Size Scale in Pre-training?
Viaarxiv icon

Eliminating Position Bias of Language Models: A Mechanistic Approach

Add code
Jul 01, 2024
Viaarxiv icon

DataComp-LM: In search of the next generation of training sets for language models

Add code
Jun 18, 2024
Figure 1 for DataComp-LM: In search of the next generation of training sets for language models
Figure 2 for DataComp-LM: In search of the next generation of training sets for language models
Figure 3 for DataComp-LM: In search of the next generation of training sets for language models
Figure 4 for DataComp-LM: In search of the next generation of training sets for language models
Viaarxiv icon

CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training

Add code
Jun 15, 2024
Viaarxiv icon

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Add code
Feb 27, 2024
Viaarxiv icon

A Study on the Calibration of In-context Learning

Add code
Dec 11, 2023
Viaarxiv icon

Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models

Add code
Nov 15, 2023
Viaarxiv icon

DeepHEN: quantitative prediction essential lncRNA genes and rethinking essentialities of lncRNA genes

Add code
Sep 18, 2023
Viaarxiv icon