Picture for Sanmi Koyejo

Sanmi Koyejo

Stanford University

AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

Add code
Mar 20, 2025
Viaarxiv icon

Reliable and Efficient Amortized Model-based Evaluation

Add code
Mar 17, 2025
Viaarxiv icon

Toward an Evaluation Science for Generative AI Systems

Add code
Mar 07, 2025
Viaarxiv icon

TIMER: Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records

Add code
Mar 06, 2025
Viaarxiv icon

Position: Model Collapse Does Not Mean What You Think

Add code
Mar 05, 2025
Viaarxiv icon

No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data

Add code
Feb 26, 2025
Viaarxiv icon

Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks

Add code
Feb 24, 2025
Viaarxiv icon

How Do Large Language Monkeys Get Their Power (Laws)?

Add code
Feb 24, 2025
Viaarxiv icon

Aligning Compound AI Systems via System-level DPO

Add code
Feb 24, 2025
Viaarxiv icon

SycEval: Evaluating LLM Sycophancy

Add code
Feb 12, 2025
Viaarxiv icon