Picture for Prateek Mittal

Prateek Mittal

The Deployment of End-to-End Audio Language Models Should Take into Account the Principle of Least Privilege

Add code
Mar 21, 2025
Viaarxiv icon

AI Agents in Cryptoland: Practical Attacks and No Silver Bullet

Add code
Mar 20, 2025
Viaarxiv icon

Privacy Auditing of Large Language Models

Add code
Mar 09, 2025
Viaarxiv icon

Adapting to Evolving Adversaries with Regularized Continual Robust Training

Add code
Feb 06, 2025
Figure 1 for Adapting to Evolving Adversaries with Regularized Continual Robust Training
Figure 2 for Adapting to Evolving Adversaries with Regularized Continual Robust Training
Figure 3 for Adapting to Evolving Adversaries with Regularized Continual Robust Training
Figure 4 for Adapting to Evolving Adversaries with Regularized Continual Robust Training
Viaarxiv icon

Capturing the Temporal Dependence of Training Data Influence

Add code
Dec 12, 2024
Figure 1 for Capturing the Temporal Dependence of Training Data Influence
Figure 2 for Capturing the Temporal Dependence of Training Data Influence
Figure 3 for Capturing the Temporal Dependence of Training Data Influence
Figure 4 for Capturing the Temporal Dependence of Training Data Influence
Viaarxiv icon

On Evaluating the Durability of Safeguards for Open-Weight LLMs

Add code
Dec 10, 2024
Viaarxiv icon

Adaptive and Stratified Subsampling Techniques for High Dimensional Non-Standard Data Environments

Add code
Oct 16, 2024
Viaarxiv icon

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

Add code
Oct 09, 2024
Figure 1 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Figure 2 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Figure 3 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Figure 4 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Viaarxiv icon

Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs

Add code
Jun 25, 2024
Figure 1 for Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Figure 2 for Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Figure 3 for Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Figure 4 for Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Viaarxiv icon

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

Add code
Jun 20, 2024
Figure 1 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 2 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 3 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 4 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Viaarxiv icon