Picture for Shashwat Goel

Shashwat Goel

Answer Matching Outperforms Multiple Choice for Language Model Evaluation

Add code
Jul 03, 2025
Viaarxiv icon

Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation

Add code
Feb 26, 2025
Viaarxiv icon

Great Models Think Alike and this Undermines AI Oversight

Add code
Feb 06, 2025
Viaarxiv icon

A Cognac shot to forget bad memories: Corrective Unlearning in GNNs

Add code
Dec 01, 2024
Figure 1 for A Cognac shot to forget bad memories: Corrective Unlearning in GNNs
Figure 2 for A Cognac shot to forget bad memories: Corrective Unlearning in GNNs
Figure 3 for A Cognac shot to forget bad memories: Corrective Unlearning in GNNs
Figure 4 for A Cognac shot to forget bad memories: Corrective Unlearning in GNNs
Viaarxiv icon

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Add code
Mar 06, 2024
Figure 1 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 2 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 3 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 4 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Viaarxiv icon

Corrective Machine Unlearning

Add code
Feb 21, 2024
Figure 1 for Corrective Machine Unlearning
Figure 2 for Corrective Machine Unlearning
Figure 3 for Corrective Machine Unlearning
Figure 4 for Corrective Machine Unlearning
Viaarxiv icon

Representation Engineering: A Top-Down Approach to AI Transparency

Add code
Oct 10, 2023
Figure 1 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 2 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 3 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 4 for Representation Engineering: A Top-Down Approach to AI Transparency
Viaarxiv icon

Proportional Aggregation of Preferences for Sequential Decision Making

Add code
Jun 26, 2023
Viaarxiv icon

Low impact agency: review and discussion

Add code
Mar 06, 2023
Figure 1 for Low impact agency: review and discussion
Figure 2 for Low impact agency: review and discussion
Figure 3 for Low impact agency: review and discussion
Figure 4 for Low impact agency: review and discussion
Viaarxiv icon

Evaluating Inexact Unlearning Requires Revisiting Forgetting

Add code
Jan 17, 2022
Figure 1 for Evaluating Inexact Unlearning Requires Revisiting Forgetting
Figure 2 for Evaluating Inexact Unlearning Requires Revisiting Forgetting
Figure 3 for Evaluating Inexact Unlearning Requires Revisiting Forgetting
Figure 4 for Evaluating Inexact Unlearning Requires Revisiting Forgetting
Viaarxiv icon