Picture for Shashwat Goel

Shashwat Goel

Scaling Open-Ended Reasoning to Predict the Future

Add code
Dec 31, 2025
Viaarxiv icon

Training AI Co-Scientists Using Rubric Rewards

Add code
Dec 29, 2025
Viaarxiv icon

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

Add code
Sep 17, 2025
Viaarxiv icon

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Add code
Sep 11, 2025
Figure 1 for The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Figure 2 for The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Figure 3 for The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Figure 4 for The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Viaarxiv icon

What if I ask in \textit{alia lingua}? Measuring Functional Similarity Across Languages

Add code
Sep 04, 2025
Viaarxiv icon

Answer Matching Outperforms Multiple Choice for Language Model Evaluation

Add code
Jul 03, 2025
Viaarxiv icon

Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation

Add code
Feb 26, 2025
Viaarxiv icon

Great Models Think Alike and this Undermines AI Oversight

Add code
Feb 06, 2025
Viaarxiv icon

A Cognac shot to forget bad memories: Corrective Unlearning in GNNs

Add code
Dec 01, 2024
Figure 1 for A Cognac shot to forget bad memories: Corrective Unlearning in GNNs
Figure 2 for A Cognac shot to forget bad memories: Corrective Unlearning in GNNs
Figure 3 for A Cognac shot to forget bad memories: Corrective Unlearning in GNNs
Figure 4 for A Cognac shot to forget bad memories: Corrective Unlearning in GNNs
Viaarxiv icon

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Add code
Mar 06, 2024
Figure 1 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 2 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 3 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 4 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Viaarxiv icon