Picture for Tom Conerly

Tom Conerly

Constitutional AI: Harmlessness from AI Feedback

Add code
Dec 15, 2022
Figure 1 for Constitutional AI: Harmlessness from AI Feedback
Figure 2 for Constitutional AI: Harmlessness from AI Feedback
Figure 3 for Constitutional AI: Harmlessness from AI Feedback
Figure 4 for Constitutional AI: Harmlessness from AI Feedback
Viaarxiv icon

In-context Learning and Induction Heads

Add code
Sep 24, 2022
Viaarxiv icon

Language Models (Mostly) Know What They Know

Add code
Jul 16, 2022
Figure 1 for Language Models (Mostly) Know What They Know
Figure 2 for Language Models (Mostly) Know What They Know
Figure 3 for Language Models (Mostly) Know What They Know
Figure 4 for Language Models (Mostly) Know What They Know
Viaarxiv icon

Scaling Laws and Interpretability of Learning from Repeated Data

Add code
May 21, 2022
Figure 1 for Scaling Laws and Interpretability of Learning from Repeated Data
Figure 2 for Scaling Laws and Interpretability of Learning from Repeated Data
Figure 3 for Scaling Laws and Interpretability of Learning from Repeated Data
Figure 4 for Scaling Laws and Interpretability of Learning from Repeated Data
Viaarxiv icon

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Add code
Apr 12, 2022
Figure 1 for Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Figure 2 for Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Figure 3 for Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Figure 4 for Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Viaarxiv icon