Picture for Peifeng Wang

Peifeng Wang

Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators

Add code
Apr 21, 2025
Viaarxiv icon

A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems

Add code
Apr 12, 2025
Viaarxiv icon

ReIFE: Re-evaluating Instruction-Following Evaluation

Add code
Oct 09, 2024
Figure 1 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 2 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 3 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 4 for ReIFE: Re-evaluating Instruction-Following Evaluation
Viaarxiv icon

Direct Judgement Preference Optimization

Add code
Sep 23, 2024
Figure 1 for Direct Judgement Preference Optimization
Figure 2 for Direct Judgement Preference Optimization
Figure 3 for Direct Judgement Preference Optimization
Figure 4 for Direct Judgement Preference Optimization
Viaarxiv icon

SCOTT: Self-Consistent Chain-of-Thought Distillation

Add code
May 03, 2023
Viaarxiv icon

PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales

Add code
Nov 03, 2022
Viaarxiv icon

Do Language Models Perform Generalizable Commonsense Inference?

Add code
Jun 22, 2021
Figure 1 for Do Language Models Perform Generalizable Commonsense Inference?
Figure 2 for Do Language Models Perform Generalizable Commonsense Inference?
Figure 3 for Do Language Models Perform Generalizable Commonsense Inference?
Figure 4 for Do Language Models Perform Generalizable Commonsense Inference?
Viaarxiv icon

Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation

Add code
Oct 24, 2020
Figure 1 for Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation
Figure 2 for Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation
Figure 3 for Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation
Figure 4 for Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation
Viaarxiv icon

When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models

Add code
Oct 10, 2020
Figure 1 for When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models
Figure 2 for When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models
Figure 3 for When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models
Figure 4 for When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models
Viaarxiv icon

Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Add code
May 02, 2020
Figure 1 for Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering
Figure 2 for Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering
Figure 3 for Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering
Figure 4 for Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering
Viaarxiv icon