Picture for Richard Edgar

Richard Edgar

Steering Language Model Refusal with Sparse Autoencoders

Add code
Nov 18, 2024
Figure 1 for Steering Language Model Refusal with Sparse Autoencoders
Figure 2 for Steering Language Model Refusal with Sparse Autoencoders
Figure 3 for Steering Language Model Refusal with Sparse Autoencoders
Figure 4 for Steering Language Model Refusal with Sparse Autoencoders
Viaarxiv icon

Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine

Add code
Nov 28, 2023
Figure 1 for Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
Figure 2 for Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
Figure 3 for Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
Figure 4 for Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
Viaarxiv icon

A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications

Add code
Oct 26, 2023
Viaarxiv icon

Fairlearn: Assessing and Improving Fairness of AI Systems

Add code
Mar 29, 2023
Viaarxiv icon