Picture for Noah A. Smith

Noah A. Smith

Paul G. Allen School of Computer Science & Engineering, University of Washington, Allen Institute for Artificial Intelligence

BASS: Benchmarking Audio LMs for Musical Structure and Semantic Reasoning

Add code
Feb 03, 2026
Viaarxiv icon

Are you going to finish that? A Practical Study of the Tokenization Boundary Problem

Add code
Jan 30, 2026
Viaarxiv icon

Bolmo: Byteifying the Next Generation of Language Models

Add code
Dec 17, 2025
Figure 1 for Bolmo: Byteifying the Next Generation of Language Models
Figure 2 for Bolmo: Byteifying the Next Generation of Language Models
Figure 3 for Bolmo: Byteifying the Next Generation of Language Models
Figure 4 for Bolmo: Byteifying the Next Generation of Language Models
Viaarxiv icon

Olmo 3

Add code
Dec 15, 2025
Viaarxiv icon

Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior

Add code
Oct 16, 2025
Viaarxiv icon

Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation

Add code
Aug 18, 2025
Viaarxiv icon

FlexOlmo: Open Language Models for Flexible Data Use

Add code
Jul 09, 2025
Figure 1 for FlexOlmo: Open Language Models for Flexible Data Use
Figure 2 for FlexOlmo: Open Language Models for Flexible Data Use
Figure 3 for FlexOlmo: Open Language Models for Flexible Data Use
Figure 4 for FlexOlmo: Open Language Models for Flexible Data Use
Viaarxiv icon

LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR

Add code
Jun 23, 2025
Viaarxiv icon

Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations

Add code
Jun 23, 2025
Viaarxiv icon

Sampling from Your Language Model One Byte at a Time

Add code
Jun 17, 2025
Viaarxiv icon