Picture for Hugh Zhang

Hugh Zhang

Planning In Natural Language Improves LLM Search For Code Generation

Add code
Sep 05, 2024
Viaarxiv icon

LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet

Add code
Aug 27, 2024
Figure 1 for LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Figure 2 for LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Figure 3 for LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Figure 4 for LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Viaarxiv icon

Learning Goal-Conditioned Representations for Language Reward Models

Add code
Jul 18, 2024
Viaarxiv icon

NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

Add code
Jun 06, 2024
Figure 1 for NATURAL PLAN: Benchmarking LLMs on Natural Language Planning
Figure 2 for NATURAL PLAN: Benchmarking LLMs on Natural Language Planning
Figure 3 for NATURAL PLAN: Benchmarking LLMs on Natural Language Planning
Figure 4 for NATURAL PLAN: Benchmarking LLMs on Natural Language Planning
Viaarxiv icon

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Add code
May 02, 2024
Figure 1 for A Careful Examination of Large Language Model Performance on Grade School Arithmetic
Figure 2 for A Careful Examination of Large Language Model Performance on Grade School Arithmetic
Figure 3 for A Careful Examination of Large Language Model Performance on Grade School Arithmetic
Figure 4 for A Careful Examination of Large Language Model Performance on Grade School Arithmetic
Viaarxiv icon

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

Add code
Feb 22, 2024
Viaarxiv icon

Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization

Add code
Feb 19, 2024
Viaarxiv icon

Chain-of-Thought Reasoning is a Policy Improvement Operator

Add code
Sep 15, 2023
Figure 1 for Chain-of-Thought Reasoning is a Policy Improvement Operator
Figure 2 for Chain-of-Thought Reasoning is a Policy Improvement Operator
Figure 3 for Chain-of-Thought Reasoning is a Policy Improvement Operator
Figure 4 for Chain-of-Thought Reasoning is a Policy Improvement Operator
Viaarxiv icon

Trading Off Diversity and Quality in Natural Language Generation

Add code
Apr 22, 2020
Figure 1 for Trading Off Diversity and Quality in Natural Language Generation
Figure 2 for Trading Off Diversity and Quality in Natural Language Generation
Figure 3 for Trading Off Diversity and Quality in Natural Language Generation
Figure 4 for Trading Off Diversity and Quality in Natural Language Generation
Viaarxiv icon

Unifying Human and Statistical Evaluation for Natural Language Generation

Add code
Apr 04, 2019
Figure 1 for Unifying Human and Statistical Evaluation for Natural Language Generation
Figure 2 for Unifying Human and Statistical Evaluation for Natural Language Generation
Figure 3 for Unifying Human and Statistical Evaluation for Natural Language Generation
Figure 4 for Unifying Human and Statistical Evaluation for Natural Language Generation
Viaarxiv icon