Picture for Anjana Arunkumar

Anjana Arunkumar

LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity

Add code
Apr 12, 2023
Viaarxiv icon

Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow

Add code
Feb 09, 2023
Viaarxiv icon

Hardness of Samples Need to be Quantified for a Reliable Evaluation System: Exploring Potential Opportunities with a New Task

Add code
Oct 14, 2022
Figure 1 for Hardness of Samples Need to be Quantified for a Reliable Evaluation System: Exploring Potential Opportunities with a New Task
Figure 2 for Hardness of Samples Need to be Quantified for a Reliable Evaluation System: Exploring Potential Opportunities with a New Task
Figure 3 for Hardness of Samples Need to be Quantified for a Reliable Evaluation System: Exploring Potential Opportunities with a New Task
Figure 4 for Hardness of Samples Need to be Quantified for a Reliable Evaluation System: Exploring Potential Opportunities with a New Task
Viaarxiv icon

A Survey of Parameters Associated with the Quality of Benchmarks in NLP

Add code
Oct 14, 2022
Figure 1 for A Survey of Parameters Associated with the Quality of Benchmarks in NLP
Figure 2 for A Survey of Parameters Associated with the Quality of Benchmarks in NLP
Figure 3 for A Survey of Parameters Associated with the Quality of Benchmarks in NLP
Figure 4 for A Survey of Parameters Associated with the Quality of Benchmarks in NLP
Viaarxiv icon

Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications

Add code
Oct 10, 2022
Figure 1 for Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications
Figure 2 for Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications
Figure 3 for Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications
Figure 4 for Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications
Viaarxiv icon

Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

Add code
Apr 16, 2022
Figure 1 for Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
Figure 2 for Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
Figure 3 for Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
Figure 4 for Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
Viaarxiv icon

A Proposal to Study "Is High Quality Data All We Need?"

Add code
Mar 12, 2022
Figure 1 for A Proposal to Study "Is High Quality Data All We Need?"
Figure 2 for A Proposal to Study "Is High Quality Data All We Need?"
Viaarxiv icon

Front Contribution instead of Back Propagation

Add code
Jun 10, 2021
Figure 1 for Front Contribution instead of Back Propagation
Figure 2 for Front Contribution instead of Back Propagation
Figure 3 for Front Contribution instead of Back Propagation
Figure 4 for Front Contribution instead of Back Propagation
Viaarxiv icon

How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation

Add code
Jun 10, 2021
Figure 1 for How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Figure 2 for How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Figure 3 for How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Figure 4 for How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Viaarxiv icon

DQI: A Guide to Benchmark Evaluation

Add code
Aug 10, 2020
Figure 1 for DQI: A Guide to Benchmark Evaluation
Figure 2 for DQI: A Guide to Benchmark Evaluation
Figure 3 for DQI: A Guide to Benchmark Evaluation
Figure 4 for DQI: A Guide to Benchmark Evaluation
Viaarxiv icon