Picture for Inkit Padhi

Inkit Padhi

Granite Guardian

Add code
Dec 10, 2024
Viaarxiv icon

Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods

Add code
Dec 05, 2024
Viaarxiv icon

Programming Refusal with Conditional Activation Steering

Add code
Sep 06, 2024
Figure 1 for Programming Refusal with Conditional Activation Steering
Figure 2 for Programming Refusal with Conditional Activation Steering
Figure 3 for Programming Refusal with Conditional Activation Steering
Figure 4 for Programming Refusal with Conditional Activation Steering
Viaarxiv icon

Value Alignment from Unstructured Text

Add code
Aug 19, 2024
Figure 1 for Value Alignment from Unstructured Text
Figure 2 for Value Alignment from Unstructured Text
Figure 3 for Value Alignment from Unstructured Text
Figure 4 for Value Alignment from Unstructured Text
Viaarxiv icon

When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails

Add code
Jul 08, 2024
Figure 1 for When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
Figure 2 for When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
Figure 3 for When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
Figure 4 for When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
Viaarxiv icon

WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia

Add code
Jun 19, 2024
Figure 1 for WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
Figure 2 for WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
Figure 3 for WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
Figure 4 for WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
Viaarxiv icon

Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs

Add code
Jun 17, 2024
Viaarxiv icon

Contextual Moral Value Alignment Through Context-Based Aggregation

Add code
Mar 19, 2024
Figure 1 for Contextual Moral Value Alignment Through Context-Based Aggregation
Figure 2 for Contextual Moral Value Alignment Through Context-Based Aggregation
Figure 3 for Contextual Moral Value Alignment Through Context-Based Aggregation
Viaarxiv icon

Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

Add code
Mar 09, 2024
Viaarxiv icon

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

Add code
Mar 08, 2024
Viaarxiv icon