Picture for Besmira Nushi

Besmira Nushi

Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead

Add code
Mar 31, 2025
Viaarxiv icon

MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation

Add code
Jan 07, 2025
Figure 1 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 2 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 3 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 4 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Viaarxiv icon

BENCHAGENTS: Automated Benchmark Creation with Agent Interaction

Add code
Oct 29, 2024
Figure 1 for BENCHAGENTS: Automated Benchmark Creation with Agent Interaction
Figure 2 for BENCHAGENTS: Automated Benchmark Creation with Agent Interaction
Figure 3 for BENCHAGENTS: Automated Benchmark Creation with Agent Interaction
Figure 4 for BENCHAGENTS: Automated Benchmark Creation with Agent Interaction
Viaarxiv icon

Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models

Add code
Oct 29, 2024
Figure 1 for Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
Figure 2 for Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
Figure 3 for Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
Figure 4 for Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
Viaarxiv icon

Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models

Add code
Oct 17, 2024
Figure 1 for Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models
Figure 2 for Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models
Figure 3 for Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models
Figure 4 for Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models
Viaarxiv icon

Eureka: Evaluating and Understanding Large Foundation Models

Add code
Sep 13, 2024
Viaarxiv icon

Understanding Information Storage and Transfer in Multi-modal Large Language Models

Add code
Jun 06, 2024
Viaarxiv icon

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Add code
Apr 18, 2024
Figure 1 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 2 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 3 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 4 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Viaarxiv icon

Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models

Add code
Apr 09, 2024
Viaarxiv icon

KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval

Add code
Oct 24, 2023
Viaarxiv icon