Picture for Ali Emami

Ali Emami

Can We Afford The Perfect Prompt? Balancing Cost and Accuracy with the Economical Prompting Index

Add code
Dec 02, 2024
Viaarxiv icon

NYT-Connections: A Deceptively Simple Text Classification Task that Stumps System-1 Thinkers

Add code
Dec 02, 2024
Viaarxiv icon

MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models

Add code
Sep 24, 2024
Viaarxiv icon

STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions

Add code
Sep 20, 2024
Viaarxiv icon

Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models

Add code
May 29, 2024
Viaarxiv icon

Picturing Ambiguity: A Visual Twist on the Winograd Schema Challenge

Add code
May 28, 2024
Viaarxiv icon

Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models

Add code
May 23, 2024
Viaarxiv icon

EvoGrad: A Dynamic Take on the Winograd Schema Challenge with Human Adversaries

Add code
Feb 22, 2024
Viaarxiv icon

WSC+: Enhancing The Winograd Schema Challenge Using Tree-of-Experts

Add code
Jan 31, 2024
Viaarxiv icon

Debiasing should be Good and Bad: Measuring the Consistency of Debiasing Techniques in Language Models

Add code
May 23, 2023
Viaarxiv icon