Picture for Sharon Grosch

Sharon Grosch

Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods

Add code
Mar 01, 2024
Viaarxiv icon