Picture for Hyungjoo Chae

Hyungjoo Chae

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Add code
Oct 17, 2024
Viaarxiv icon

Evaluating Robustness of Reward Models for Mathematical Reasoning

Add code
Oct 02, 2024
Figure 1 for Evaluating Robustness of Reward Models for Mathematical Reasoning
Figure 2 for Evaluating Robustness of Reward Models for Mathematical Reasoning
Figure 3 for Evaluating Robustness of Reward Models for Mathematical Reasoning
Figure 4 for Evaluating Robustness of Reward Models for Mathematical Reasoning
Viaarxiv icon

Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code

Add code
Sep 29, 2024
Figure 1 for Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
Figure 2 for Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
Figure 3 for Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
Figure 4 for Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
Viaarxiv icon

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

Add code
Jun 20, 2024
Figure 1 for Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Figure 2 for Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Figure 3 for Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Figure 4 for Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Viaarxiv icon

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Add code
Jun 09, 2024
Figure 1 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 2 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 3 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 4 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Viaarxiv icon

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Add code
Apr 03, 2024
Viaarxiv icon

Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering

Add code
Mar 05, 2024
Viaarxiv icon

VerifiNER: Verification-augmented NER via Knowledge-grounded Reasoning with Large Language Models

Add code
Feb 28, 2024
Figure 1 for VerifiNER: Verification-augmented NER via Knowledge-grounded Reasoning with Large Language Models
Figure 2 for VerifiNER: Verification-augmented NER via Knowledge-grounded Reasoning with Large Language Models
Figure 3 for VerifiNER: Verification-augmented NER via Knowledge-grounded Reasoning with Large Language Models
Figure 4 for VerifiNER: Verification-augmented NER via Knowledge-grounded Reasoning with Large Language Models
Viaarxiv icon

Coffee: Boost Your Code LLMs by Fixing Bugs with Feedback

Add code
Nov 13, 2023
Viaarxiv icon

Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents

Add code
Oct 22, 2023
Viaarxiv icon