Picture for Rishabh Bhardwaj

Rishabh Bhardwaj

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Add code
Sep 17, 2024
Viaarxiv icon

Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique

Add code
Aug 20, 2024
Viaarxiv icon

WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

Add code
Aug 07, 2024
Viaarxiv icon

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

Add code
Jun 17, 2024
Viaarxiv icon

Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

Add code
Jun 17, 2024
Viaarxiv icon

HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks

Add code
Apr 06, 2024
Figure 1 for HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
Figure 2 for HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
Figure 3 for HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
Figure 4 for HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
Viaarxiv icon

Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic

Add code
Feb 19, 2024
Viaarxiv icon

Adapter Pruning using Tropical Characterization

Add code
Oct 30, 2023
Viaarxiv icon

Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases

Add code
Oct 22, 2023
Viaarxiv icon

Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment

Add code
Aug 30, 2023
Viaarxiv icon