Picture for Ninareh Mehrabi

Ninareh Mehrabi

Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models

Add code
Oct 07, 2024
Figure 1 for Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models
Figure 2 for Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models
Figure 3 for Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models
Figure 4 for Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models
Viaarxiv icon

Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification

Add code
Oct 07, 2024
Figure 1 for Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification
Figure 2 for Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification
Figure 3 for Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification
Figure 4 for Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification
Viaarxiv icon

Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs

Add code
Jul 31, 2024
Viaarxiv icon

Prompt Perturbation Consistency Learning for Robust Language Models

Add code
Feb 24, 2024
Figure 1 for Prompt Perturbation Consistency Learning for Robust Language Models
Figure 2 for Prompt Perturbation Consistency Learning for Robust Language Models
Figure 3 for Prompt Perturbation Consistency Learning for Robust Language Models
Figure 4 for Prompt Perturbation Consistency Learning for Robust Language Models
Viaarxiv icon

Are you talking to or ? On Tokenization and Addressing Misgendering in LLMs with Pronoun Tokenization Parity

Add code
Dec 21, 2023
Viaarxiv icon

JAB: Joint Adversarial Prompting and Belief Augmentation

Add code
Nov 16, 2023
Viaarxiv icon

On the steerability of large language models toward data-driven personas

Add code
Nov 08, 2023
Viaarxiv icon

FLIRT: Feedback Loop In-context Red Teaming

Add code
Aug 08, 2023
Figure 1 for FLIRT: Feedback Loop In-context Red Teaming
Figure 2 for FLIRT: Feedback Loop In-context Red Teaming
Figure 3 for FLIRT: Feedback Loop In-context Red Teaming
Figure 4 for FLIRT: Feedback Loop In-context Red Teaming
Viaarxiv icon

Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models

Add code
Nov 17, 2022
Viaarxiv icon

Robust Conversational Agents against Imperceptible Toxicity Triggers

Add code
May 05, 2022
Figure 1 for Robust Conversational Agents against Imperceptible Toxicity Triggers
Figure 2 for Robust Conversational Agents against Imperceptible Toxicity Triggers
Figure 3 for Robust Conversational Agents against Imperceptible Toxicity Triggers
Figure 4 for Robust Conversational Agents against Imperceptible Toxicity Triggers
Viaarxiv icon