Picture for Satyapriya Krishna

Satyapriya Krishna

Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL

Add code
Oct 16, 2024
Viaarxiv icon

Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

Add code
Jul 20, 2024
Viaarxiv icon

More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness

Add code
Apr 29, 2024
Viaarxiv icon

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Add code
Apr 10, 2024
Figure 1 for Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Figure 2 for Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Figure 3 for Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Figure 4 for Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Viaarxiv icon

Understanding the Effects of Iterative Prompting on Truthfulness

Add code
Feb 09, 2024
Viaarxiv icon

Black-Box Access is Insufficient for Rigorous AI Audits

Add code
Jan 25, 2024
Viaarxiv icon

On the Intersection of Self-Correction and Trust in Language Models

Add code
Nov 06, 2023
Viaarxiv icon

Are Large Language Models Post Hoc Explainers?

Add code
Oct 10, 2023
Viaarxiv icon

On the Trade-offs between Adversarial Robustness and Actionable Explanations

Add code
Sep 28, 2023
Viaarxiv icon

Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten

Add code
Feb 10, 2023
Viaarxiv icon