Picture for Ori Yoran

Ori Yoran

The KoLMogorov Test: Compression by Code Generation

Add code
Mar 18, 2025
Viaarxiv icon

Preventing Rogue Agents Improves Multi-Agent Collaboration

Add code
Feb 09, 2025
Figure 1 for Preventing Rogue Agents Improves Multi-Agent Collaboration
Figure 2 for Preventing Rogue Agents Improves Multi-Agent Collaboration
Figure 3 for Preventing Rogue Agents Improves Multi-Agent Collaboration
Figure 4 for Preventing Rogue Agents Improves Multi-Agent Collaboration
Viaarxiv icon

The BrowserGym Ecosystem for Web Agent Research

Add code
Dec 10, 2024
Figure 1 for The BrowserGym Ecosystem for Web Agent Research
Figure 2 for The BrowserGym Ecosystem for Web Agent Research
Figure 3 for The BrowserGym Ecosystem for Web Agent Research
Figure 4 for The BrowserGym Ecosystem for Web Agent Research
Viaarxiv icon

AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

Add code
Jul 22, 2024
Viaarxiv icon

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty

Add code
Jul 08, 2024
Viaarxiv icon

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

Add code
Oct 02, 2023
Viaarxiv icon

Evaluating the Ripple Effects of Knowledge Editing in Language Models

Add code
Jul 24, 2023
Viaarxiv icon

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

Add code
Apr 25, 2023
Figure 1 for Answering Questions by Meta-Reasoning over Multiple Chains of Thought
Figure 2 for Answering Questions by Meta-Reasoning over Multiple Chains of Thought
Figure 3 for Answering Questions by Meta-Reasoning over Multiple Chains of Thought
Figure 4 for Answering Questions by Meta-Reasoning over Multiple Chains of Thought
Viaarxiv icon

QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs

Add code
May 26, 2022
Figure 1 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Figure 2 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Figure 3 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Figure 4 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Viaarxiv icon

CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

Add code
Jan 14, 2022
Figure 1 for CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Figure 2 for CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Figure 3 for CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Figure 4 for CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Viaarxiv icon