Picture for Lilian Weng

Lilian Weng

Tony

Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning

Add code
Dec 24, 2024
Viaarxiv icon

OpenAI o1 System Card

Add code
Dec 21, 2024
Viaarxiv icon

Rule Based Rewards for Language Model Safety

Add code
Nov 02, 2024
Viaarxiv icon

GPT-4o System Card

Add code
Oct 25, 2024
Viaarxiv icon

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

Add code
Oct 09, 2024
Figure 1 for MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Figure 2 for MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Figure 3 for MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Figure 4 for MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Viaarxiv icon

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Add code
Apr 19, 2024
Viaarxiv icon

A Holistic Approach to Undesired Content Detection in the Real World

Add code
Aug 05, 2022
Figure 1 for A Holistic Approach to Undesired Content Detection in the Real World
Figure 2 for A Holistic Approach to Undesired Content Detection in the Real World
Figure 3 for A Holistic Approach to Undesired Content Detection in the Real World
Figure 4 for A Holistic Approach to Undesired Content Detection in the Real World
Viaarxiv icon

Exploration in Deep Reinforcement Learning: A Survey

Add code
May 02, 2022
Figure 1 for Exploration in Deep Reinforcement Learning: A Survey
Figure 2 for Exploration in Deep Reinforcement Learning: A Survey
Figure 3 for Exploration in Deep Reinforcement Learning: A Survey
Figure 4 for Exploration in Deep Reinforcement Learning: A Survey
Viaarxiv icon

Text and Code Embeddings by Contrastive Pre-Training

Add code
Jan 24, 2022
Figure 1 for Text and Code Embeddings by Contrastive Pre-Training
Figure 2 for Text and Code Embeddings by Contrastive Pre-Training
Figure 3 for Text and Code Embeddings by Contrastive Pre-Training
Figure 4 for Text and Code Embeddings by Contrastive Pre-Training
Viaarxiv icon

Asymmetric self-play for automatic goal discovery in robotic manipulation

Add code
Jan 13, 2021
Figure 1 for Asymmetric self-play for automatic goal discovery in robotic manipulation
Figure 2 for Asymmetric self-play for automatic goal discovery in robotic manipulation
Figure 3 for Asymmetric self-play for automatic goal discovery in robotic manipulation
Figure 4 for Asymmetric self-play for automatic goal discovery in robotic manipulation
Viaarxiv icon