Picture for Hima Lakkaraju

Hima Lakkaraju

Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution

Add code
Jan 31, 2025
Figure 1 for Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Figure 2 for Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Figure 3 for Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Figure 4 for Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Viaarxiv icon

A Study on the Calibration of In-context Learning

Add code
Dec 11, 2023
Viaarxiv icon

Certifying LLM Safety against Adversarial Prompting

Add code
Sep 06, 2023
Figure 1 for Certifying LLM Safety against Adversarial Prompting
Figure 2 for Certifying LLM Safety against Adversarial Prompting
Figure 3 for Certifying LLM Safety against Adversarial Prompting
Figure 4 for Certifying LLM Safety against Adversarial Prompting
Viaarxiv icon

Fair Machine Unlearning: Data Removal while Mitigating Disparities

Add code
Jul 27, 2023
Viaarxiv icon

Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness

Add code
May 30, 2023
Viaarxiv icon