Picture for Tushar Khot

Tushar Khot

SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories

Add code
Sep 11, 2024
Viaarxiv icon

AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Add code
Jul 26, 2024
Viaarxiv icon

DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

Add code
Jul 01, 2024
Viaarxiv icon

DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents

Add code
Jun 10, 2024
Figure 1 for DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents
Figure 2 for DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents
Figure 3 for DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents
Figure 4 for DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents
Viaarxiv icon

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Add code
Jun 10, 2024
Viaarxiv icon

OLMo: Accelerating the Science of Language Models

Add code
Feb 07, 2024
Figure 1 for OLMo: Accelerating the Science of Language Models
Figure 2 for OLMo: Accelerating the Science of Language Models
Figure 3 for OLMo: Accelerating the Science of Language Models
Figure 4 for OLMo: Accelerating the Science of Language Models
Viaarxiv icon

ADaPT: As-Needed Decomposition and Planning with Language Models

Add code
Nov 08, 2023
Viaarxiv icon

Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

Add code
Nov 08, 2023
Figure 1 for Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Figure 2 for Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Figure 3 for Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Figure 4 for Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Viaarxiv icon

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

Add code
Jun 07, 2023
Viaarxiv icon

Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance

Add code
May 26, 2023
Viaarxiv icon