Picture for Rebecca Qian

Rebecca Qian

Lynx: An Open Source Hallucination Evaluation Model

Add code
Jul 11, 2024
Figure 1 for Lynx: An Open Source Hallucination Evaluation Model
Figure 2 for Lynx: An Open Source Hallucination Evaluation Model
Figure 3 for Lynx: An Open Source Hallucination Evaluation Model
Figure 4 for Lynx: An Open Source Hallucination Evaluation Model
Viaarxiv icon

FinanceBench: A New Benchmark for Financial Question Answering

Add code
Nov 20, 2023
Figure 1 for FinanceBench: A New Benchmark for Financial Question Answering
Figure 2 for FinanceBench: A New Benchmark for Financial Question Answering
Figure 3 for FinanceBench: A New Benchmark for Financial Question Answering
Figure 4 for FinanceBench: A New Benchmark for Financial Question Answering
Viaarxiv icon

Step by Step to Fairness: Attributing Societal Bias in Task-oriented Dialogue Systems

Add code
Nov 14, 2023
Viaarxiv icon

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

Add code
Nov 14, 2023
Figure 1 for SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
Figure 2 for SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
Figure 3 for SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
Figure 4 for SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
Viaarxiv icon

Perturbation Augmentation for Fairer NLP

Add code
May 25, 2022
Figure 1 for Perturbation Augmentation for Fairer NLP
Figure 2 for Perturbation Augmentation for Fairer NLP
Figure 3 for Perturbation Augmentation for Fairer NLP
Figure 4 for Perturbation Augmentation for Fairer NLP
Viaarxiv icon

Many Episode Learning in a Modular Embodied Agent via End-to-End Interaction

Add code
Apr 19, 2022
Figure 1 for Many Episode Learning in a Modular Embodied Agent via End-to-End Interaction
Figure 2 for Many Episode Learning in a Modular Embodied Agent via End-to-End Interaction
Figure 3 for Many Episode Learning in a Modular Embodied Agent via End-to-End Interaction
Figure 4 for Many Episode Learning in a Modular Embodied Agent via End-to-End Interaction
Viaarxiv icon

Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents

Add code
Jan 12, 2022
Figure 1 for Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents
Figure 2 for Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents
Figure 3 for Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents
Figure 4 for Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents
Viaarxiv icon

droidlet: modular, heterogenous, multi-modal agents

Add code
Jan 25, 2021
Figure 1 for droidlet: modular, heterogenous, multi-modal agents
Figure 2 for droidlet: modular, heterogenous, multi-modal agents
Figure 3 for droidlet: modular, heterogenous, multi-modal agents
Figure 4 for droidlet: modular, heterogenous, multi-modal agents
Viaarxiv icon