Picture for Chatrik Singh Mangat

Chatrik Singh Mangat

From Stability to Inconsistency: A Study of Moral Preferences in LLMs

Add code
Apr 08, 2025
Viaarxiv icon

FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research

Add code
Mar 29, 2025
Viaarxiv icon

Characterizing stable regions in the residual stream of LLMs

Add code
Sep 26, 2024
Viaarxiv icon