Picture for He He

He He

Transformers Struggle to Learn to Search

Add code
Dec 06, 2024
Viaarxiv icon

Beyond the Binary: Capturing Diverse Preferences With Reward Regularization

Add code
Dec 05, 2024
Viaarxiv icon

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats

Add code
Nov 26, 2024
Figure 1 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Figure 2 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Figure 3 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Figure 4 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Viaarxiv icon

Spontaneous Reward Hacking in Iterative Self-Refinement

Add code
Jul 05, 2024
Figure 1 for Spontaneous Reward Hacking in Iterative Self-Refinement
Figure 2 for Spontaneous Reward Hacking in Iterative Self-Refinement
Figure 3 for Spontaneous Reward Hacking in Iterative Self-Refinement
Figure 4 for Spontaneous Reward Hacking in Iterative Self-Refinement
Viaarxiv icon

LLMs Are Prone to Fallacies in Causal Inference

Add code
Jun 18, 2024
Figure 1 for LLMs Are Prone to Fallacies in Causal Inference
Figure 2 for LLMs Are Prone to Fallacies in Causal Inference
Figure 3 for LLMs Are Prone to Fallacies in Causal Inference
Figure 4 for LLMs Are Prone to Fallacies in Causal Inference
Viaarxiv icon

Iterative Reasoning Preference Optimization

Add code
Apr 30, 2024
Figure 1 for Iterative Reasoning Preference Optimization
Figure 2 for Iterative Reasoning Preference Optimization
Figure 3 for Iterative Reasoning Preference Optimization
Figure 4 for Iterative Reasoning Preference Optimization
Viaarxiv icon

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Add code
Apr 24, 2024
Figure 1 for The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Figure 2 for The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Figure 3 for The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Figure 4 for The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Figure 1 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 2 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 3 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 4 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Viaarxiv icon

Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World

Add code
Mar 30, 2024
Figure 1 for Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World
Figure 2 for Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World
Figure 3 for Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World
Figure 4 for Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World
Viaarxiv icon

Parallel Structures in Pre-training Data Yield In-Context Learning

Add code
Feb 19, 2024
Viaarxiv icon