Picture for Prateek Mittal

Prateek Mittal

Adaptive and Stratified Subsampling Techniques for High Dimensional Non-Standard Data Environments

Add code
Oct 16, 2024
Viaarxiv icon

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

Add code
Oct 09, 2024
Viaarxiv icon

Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs

Add code
Jun 25, 2024
Viaarxiv icon

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

Add code
Jun 20, 2024
Viaarxiv icon

Data Shapley in One Training Run

Add code
Jun 16, 2024
Viaarxiv icon

Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Add code
Jun 10, 2024
Viaarxiv icon

AI Risk Management Should Incorporate Both Safety and Security

Add code
May 29, 2024
Viaarxiv icon

Certifiably Robust RAG against Retrieval Corruption

Add code
May 24, 2024
Viaarxiv icon

Position Paper: Beyond Robustness Against Single Attack Types

Add code
May 02, 2024
Viaarxiv icon

Teach LLMs to Phish: Stealing Private Information from Language Models

Add code
Mar 01, 2024
Viaarxiv icon