Picture for Negar Arabzadeh

Negar Arabzadeh

Offline Evaluation of Set-Based Text-to-Image Generation

Add code
Oct 22, 2024
Viaarxiv icon

IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents

Add code
Jul 12, 2024
Figure 1 for IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents
Figure 2 for IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents
Figure 3 for IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents
Figure 4 for IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents
Viaarxiv icon

Assessing and Verifying Task Utility in LLM-Powered Applications

Add code
May 03, 2024
Figure 1 for Assessing and Verifying Task Utility in LLM-Powered Applications
Figure 2 for Assessing and Verifying Task Utility in LLM-Powered Applications
Figure 3 for Assessing and Verifying Task Utility in LLM-Powered Applications
Figure 4 for Assessing and Verifying Task Utility in LLM-Powered Applications
Viaarxiv icon

Ranked List Truncation for Large Language Model-based Re-Ranking

Add code
Apr 28, 2024
Viaarxiv icon

Generative Information Retrieval Evaluation

Add code
Apr 11, 2024
Viaarxiv icon

A Comparison of Methods for Evaluating Generative IR

Add code
Apr 09, 2024
Viaarxiv icon

Query Performance Prediction using Relevance Judgments Generated by Large Language Models

Add code
Apr 01, 2024
Viaarxiv icon

Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications

Add code
Feb 22, 2024
Viaarxiv icon

Fréchet Distance for Offline Evaluation of Information Retrieval Systems with Sparse Labels

Add code
Jan 31, 2024
Viaarxiv icon

Adapting Standard Retrieval Benchmarks to Evaluate Generated Answers

Add code
Jan 09, 2024
Viaarxiv icon