Picture for Hannaneh Hajishirzi

Hannaneh Hajishirzi

Shammie

OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization

Add code
Jun 23, 2025
Viaarxiv icon

Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index

Add code
Jun 13, 2025
Viaarxiv icon

Spurious Rewards: Rethinking Training Signals in RLVR

Add code
Jun 12, 2025
Viaarxiv icon

ScienceMeter: Tracking Scientific Knowledge Updates in Language Models

Add code
May 30, 2025
Viaarxiv icon

Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training

Add code
May 29, 2025
Viaarxiv icon

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Add code
Apr 20, 2025
Viaarxiv icon

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Add code
Apr 09, 2025
Viaarxiv icon

Steering off Course: Reliability Challenges in Steering Language Models

Add code
Apr 06, 2025
Viaarxiv icon

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

Add code
Mar 11, 2025
Viaarxiv icon

s1: Simple test-time scaling

Add code
Jan 31, 2025
Figure 1 for s1: Simple test-time scaling
Figure 2 for s1: Simple test-time scaling
Figure 3 for s1: Simple test-time scaling
Figure 4 for s1: Simple test-time scaling
Viaarxiv icon