Picture for Or Honovich

Or Honovich

Keep Guessing? When Considering Inference Scaling, Mind the Baselines

Add code
Oct 20, 2024
Figure 1 for Keep Guessing? When Considering Inference Scaling, Mind the Baselines
Figure 2 for Keep Guessing? When Considering Inference Scaling, Mind the Baselines
Figure 3 for Keep Guessing? When Considering Inference Scaling, Mind the Baselines
Figure 4 for Keep Guessing? When Considering Inference Scaling, Mind the Baselines
Viaarxiv icon

A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains

Add code
Feb 02, 2024
Figure 1 for A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Figure 2 for A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Figure 3 for A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Figure 4 for A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Viaarxiv icon

Surfacing Biases in Large Language Models using Contrastive Input Decoding

Add code
May 12, 2023
Viaarxiv icon

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

Add code
Dec 19, 2022
Viaarxiv icon

DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering

Add code
Nov 10, 2022
Viaarxiv icon

LMentry: A Language Model Benchmark of Elementary Language Tasks

Add code
Nov 03, 2022
Viaarxiv icon

Instruction Induction: From Few Examples to Natural Language Task Descriptions

Add code
May 22, 2022
Figure 1 for Instruction Induction: From Few Examples to Natural Language Task Descriptions
Figure 2 for Instruction Induction: From Few Examples to Natural Language Task Descriptions
Figure 3 for Instruction Induction: From Few Examples to Natural Language Task Descriptions
Figure 4 for Instruction Induction: From Few Examples to Natural Language Task Descriptions
Viaarxiv icon

TRUE: Re-evaluating Factual Consistency Evaluation

Add code
Apr 11, 2022
Figure 1 for TRUE: Re-evaluating Factual Consistency Evaluation
Figure 2 for TRUE: Re-evaluating Factual Consistency Evaluation
Figure 3 for TRUE: Re-evaluating Factual Consistency Evaluation
Figure 4 for TRUE: Re-evaluating Factual Consistency Evaluation
Viaarxiv icon

$Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering

Add code
Apr 16, 2021
Figure 1 for $Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering
Figure 2 for $Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering
Figure 3 for $Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering
Figure 4 for $Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering
Viaarxiv icon