Picture for Byung-Doh Oh

Byung-Doh Oh

The Impact of Token Granularity on the Predictive Power of Language Model Surprisal

Add code
Dec 16, 2024
Viaarxiv icon

Linear Recency Bias During Training Improves Transformers' Fit to Reading Times

Add code
Sep 17, 2024
Viaarxiv icon

Leading Whitespaces of Language Models' Subword Vocabulary Poses a Confound for Calculating Word Probabilities

Add code
Jun 16, 2024
Viaarxiv icon

Frequency Explains the Inverse Correlation of Large Language Models' Size, Training Data Amount, and Surprisal's Fit to Reading Times

Add code
Feb 03, 2024
Viaarxiv icon

Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions

Add code
May 17, 2023
Viaarxiv icon

Transformer-Based LM Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens

Add code
Apr 22, 2023
Viaarxiv icon

Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?

Add code
Dec 23, 2022
Viaarxiv icon

Entropy- and Distance-Based Predictors From GPT-2 Attention Patterns Predict Reading Times Over and Above GPT-2 Surprisal

Add code
Dec 21, 2022
Viaarxiv icon