Picture for Xinyun Chen

Xinyun Chen

MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations

Add code
Feb 10, 2025
Viaarxiv icon

SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling

Add code
Jan 31, 2025
Figure 1 for SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
Figure 2 for SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
Figure 3 for SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
Figure 4 for SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
Viaarxiv icon

On Memorization of Large Language Models in Logical Reasoning

Add code
Oct 30, 2024
Figure 1 for On Memorization of Large Language Models in Logical Reasoning
Figure 2 for On Memorization of Large Language Models in Logical Reasoning
Figure 3 for On Memorization of Large Language Models in Logical Reasoning
Figure 4 for On Memorization of Large Language Models in Logical Reasoning
Viaarxiv icon

NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

Add code
Jun 06, 2024
Figure 1 for NATURAL PLAN: Benchmarking LLMs on Natural Language Planning
Figure 2 for NATURAL PLAN: Benchmarking LLMs on Natural Language Planning
Figure 3 for NATURAL PLAN: Benchmarking LLMs on Natural Language Planning
Figure 4 for NATURAL PLAN: Benchmarking LLMs on Natural Language Planning
Viaarxiv icon

Vulnerability Detection with Code Language Models: How Far Are We?

Add code
Mar 27, 2024
Figure 1 for Vulnerability Detection with Code Language Models: How Far Are We?
Figure 2 for Vulnerability Detection with Code Language Models: How Far Are We?
Figure 3 for Vulnerability Detection with Code Language Models: How Far Are We?
Figure 4 for Vulnerability Detection with Code Language Models: How Far Are We?
Viaarxiv icon

A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Add code
Feb 23, 2024
Viaarxiv icon

Transformers Can Achieve Length Generalization But Not Robustly

Add code
Feb 14, 2024
Viaarxiv icon

Premise Order Matters in Reasoning with Large Language Models

Add code
Feb 14, 2024
Viaarxiv icon

Self-Discover: Large Language Models Self-Compose Reasoning Structures

Add code
Feb 06, 2024
Figure 1 for Self-Discover: Large Language Models Self-Compose Reasoning Structures
Figure 2 for Self-Discover: Large Language Models Self-Compose Reasoning Structures
Figure 3 for Self-Discover: Large Language Models Self-Compose Reasoning Structures
Figure 4 for Self-Discover: Large Language Models Self-Compose Reasoning Structures
Viaarxiv icon

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Add code
Dec 08, 2023
Figure 1 for Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Figure 2 for Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Figure 3 for Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Figure 4 for Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Viaarxiv icon