Picture for Alon Jacovi

Alon Jacovi

CERCO UMR5549, ANITI

ConSim: Measuring Concept-Based Explanations' Effectiveness with Automated Simulatability

Add code
Jan 13, 2025
Viaarxiv icon

The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input

Add code
Jan 06, 2025
Figure 1 for The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input
Figure 2 for The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input
Figure 3 for The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input
Figure 4 for The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input
Viaarxiv icon

CoverBench: A Challenging Benchmark for Complex Claim Verification

Add code
Aug 06, 2024
Viaarxiv icon

Data Contamination Report from the 2024 CONDA Shared Task

Add code
Jul 31, 2024
Figure 1 for Data Contamination Report from the 2024 CONDA Shared Task
Figure 2 for Data Contamination Report from the 2024 CONDA Shared Task
Figure 3 for Data Contamination Report from the 2024 CONDA Shared Task
Figure 4 for Data Contamination Report from the 2024 CONDA Shared Task
Viaarxiv icon

Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP

Add code
Jun 29, 2024
Viaarxiv icon

Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations

Add code
Jun 19, 2024
Viaarxiv icon

TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools

Add code
Jun 05, 2024
Figure 1 for TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools
Figure 2 for TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools
Figure 3 for TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools
Figure 4 for TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools
Viaarxiv icon

A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains

Add code
Feb 02, 2024
Figure 1 for A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Figure 2 for A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Figure 3 for A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Figure 4 for A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Viaarxiv icon

A Comprehensive Evaluation of Tool-Assisted Generation Strategies

Add code
Oct 16, 2023
Viaarxiv icon

Unpacking Human-AI Interaction in Safety-Critical Industries: A Systematic Literature Review

Add code
Oct 05, 2023
Viaarxiv icon