Picture for Ori Yoran

Ori Yoran

Preventing Rogue Agents Improves Multi-Agent Collaboration

Add code
Feb 09, 2025
Viaarxiv icon

The BrowserGym Ecosystem for Web Agent Research

Add code
Dec 10, 2024
Figure 1 for The BrowserGym Ecosystem for Web Agent Research
Figure 2 for The BrowserGym Ecosystem for Web Agent Research
Figure 3 for The BrowserGym Ecosystem for Web Agent Research
Figure 4 for The BrowserGym Ecosystem for Web Agent Research
Viaarxiv icon

AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

Add code
Jul 22, 2024
Viaarxiv icon

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty

Add code
Jul 08, 2024
Viaarxiv icon

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

Add code
Oct 02, 2023
Viaarxiv icon

Evaluating the Ripple Effects of Knowledge Editing in Language Models

Add code
Jul 24, 2023
Viaarxiv icon

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

Add code
Apr 25, 2023
Viaarxiv icon

QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs

Add code
May 26, 2022
Figure 1 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Figure 2 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Figure 3 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Figure 4 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Viaarxiv icon

CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

Add code
Jan 14, 2022
Figure 1 for CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Figure 2 for CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Figure 3 for CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Figure 4 for CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Viaarxiv icon

SCROLLS: Standardized CompaRison Over Long Language Sequences

Add code
Jan 10, 2022
Figure 1 for SCROLLS: Standardized CompaRison Over Long Language Sequences
Figure 2 for SCROLLS: Standardized CompaRison Over Long Language Sequences
Figure 3 for SCROLLS: Standardized CompaRison Over Long Language Sequences
Figure 4 for SCROLLS: Standardized CompaRison Over Long Language Sequences
Viaarxiv icon