Picture for Been Kim

Been Kim

Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty

Add code
Dec 09, 2024
Viaarxiv icon

Getting aligned on representational alignment

Add code
Nov 02, 2023
Viaarxiv icon

Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero

Add code
Oct 25, 2023
Viaarxiv icon

State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding

Add code
Sep 21, 2023
Viaarxiv icon

Don't trust your eyes: on the reliability of feature visualizations

Add code
Jun 21, 2023
Viaarxiv icon

Gaussian Process Probes (GPP) for Uncertainty-Aware Probing

Add code
May 29, 2023
Figure 1 for Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
Figure 2 for Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
Figure 3 for Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
Figure 4 for Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
Viaarxiv icon

Model evaluation for extreme risks

Add code
May 24, 2023
Figure 1 for Model evaluation for extreme risks
Figure 2 for Model evaluation for extreme risks
Figure 3 for Model evaluation for extreme risks
Figure 4 for Model evaluation for extreme risks
Viaarxiv icon

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

Add code
Jan 10, 2023
Figure 1 for Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Figure 2 for Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Figure 3 for Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Figure 4 for Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Viaarxiv icon

Impossibility Theorems for Feature Attribution

Add code
Dec 22, 2022
Figure 1 for Impossibility Theorems for Feature Attribution
Figure 2 for Impossibility Theorems for Feature Attribution
Figure 3 for Impossibility Theorems for Feature Attribution
Figure 4 for Impossibility Theorems for Feature Attribution
Viaarxiv icon

On the Relationship Between Explanation and Prediction: A Causal View

Add code
Dec 20, 2022
Viaarxiv icon