Picture for Manish Nagireddy

Manish Nagireddy

Granite Guardian

Add code
Dec 10, 2024
Viaarxiv icon

Programming Refusal with Conditional Activation Steering

Add code
Sep 06, 2024
Figure 1 for Programming Refusal with Conditional Activation Steering
Figure 2 for Programming Refusal with Conditional Activation Steering
Figure 3 for Programming Refusal with Conditional Activation Steering
Figure 4 for Programming Refusal with Conditional Activation Steering
Viaarxiv icon

Value Alignment from Unstructured Text

Add code
Aug 19, 2024
Figure 1 for Value Alignment from Unstructured Text
Figure 2 for Value Alignment from Unstructured Text
Figure 3 for Value Alignment from Unstructured Text
Figure 4 for Value Alignment from Unstructured Text
Viaarxiv icon

When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails

Add code
Jul 08, 2024
Figure 1 for When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
Figure 2 for When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
Figure 3 for When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
Figure 4 for When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
Viaarxiv icon

The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers

Add code
Apr 03, 2024
Figure 1 for The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers
Figure 2 for The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers
Figure 3 for The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers
Figure 4 for The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers
Viaarxiv icon

Language Models in Dialogue: Conversational Maxims for Human-AI Interactions

Add code
Mar 22, 2024
Figure 1 for Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
Figure 2 for Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
Figure 3 for Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
Figure 4 for Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
Viaarxiv icon

Multi-Level Explanations for Generative Language Models

Add code
Mar 21, 2024
Figure 1 for Multi-Level Explanations for Generative Language Models
Figure 2 for Multi-Level Explanations for Generative Language Models
Figure 3 for Multi-Level Explanations for Generative Language Models
Figure 4 for Multi-Level Explanations for Generative Language Models
Viaarxiv icon

Contextual Moral Value Alignment Through Context-Based Aggregation

Add code
Mar 19, 2024
Figure 1 for Contextual Moral Value Alignment Through Context-Based Aggregation
Figure 2 for Contextual Moral Value Alignment Through Context-Based Aggregation
Figure 3 for Contextual Moral Value Alignment Through Context-Based Aggregation
Viaarxiv icon

Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

Add code
Mar 09, 2024
Viaarxiv icon

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

Add code
Mar 08, 2024
Viaarxiv icon