Picture for Prateek Mittal

Prateek Mittal

Capturing the Temporal Dependence of Training Data Influence

Add code
Dec 12, 2024
Viaarxiv icon

On Evaluating the Durability of Safeguards for Open-Weight LLMs

Add code
Dec 10, 2024
Viaarxiv icon

Adaptive and Stratified Subsampling Techniques for High Dimensional Non-Standard Data Environments

Add code
Oct 16, 2024
Viaarxiv icon

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

Add code
Oct 09, 2024
Figure 1 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Figure 2 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Figure 3 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Figure 4 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Viaarxiv icon

Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs

Add code
Jun 25, 2024
Viaarxiv icon

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

Add code
Jun 20, 2024
Viaarxiv icon

Data Shapley in One Training Run

Add code
Jun 16, 2024
Viaarxiv icon

Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Add code
Jun 10, 2024
Viaarxiv icon

AI Risk Management Should Incorporate Both Safety and Security

Add code
May 29, 2024
Viaarxiv icon

Certifiably Robust RAG against Retrieval Corruption

Add code
May 24, 2024
Viaarxiv icon