Picture for Jinzhu Wu

Jinzhu Wu

Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance

Add code
Feb 18, 2025
Viaarxiv icon