Picture for Pierre Dognin

Pierre Dognin

Granite Guardian

Add code
Dec 10, 2024
Viaarxiv icon

Evaluating the Prompt Steerability of Large Language Models

Add code
Nov 19, 2024
Viaarxiv icon

Programming Refusal with Conditional Activation Steering

Add code
Sep 06, 2024
Figure 1 for Programming Refusal with Conditional Activation Steering
Figure 2 for Programming Refusal with Conditional Activation Steering
Figure 3 for Programming Refusal with Conditional Activation Steering
Figure 4 for Programming Refusal with Conditional Activation Steering
Viaarxiv icon

Value Alignment from Unstructured Text

Add code
Aug 19, 2024
Figure 1 for Value Alignment from Unstructured Text
Figure 2 for Value Alignment from Unstructured Text
Figure 3 for Value Alignment from Unstructured Text
Figure 4 for Value Alignment from Unstructured Text
Viaarxiv icon

Contextual Moral Value Alignment Through Context-Based Aggregation

Add code
Mar 19, 2024
Figure 1 for Contextual Moral Value Alignment Through Context-Based Aggregation
Figure 2 for Contextual Moral Value Alignment Through Context-Based Aggregation
Figure 3 for Contextual Moral Value Alignment Through Context-Based Aggregation
Viaarxiv icon

Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

Add code
Mar 09, 2024
Viaarxiv icon

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

Add code
Mar 08, 2024
Viaarxiv icon

Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

Add code
May 02, 2023
Figure 1 for Auditing and Generating Synthetic Data with Controllable Trust Trade-offs
Figure 2 for Auditing and Generating Synthetic Data with Controllable Trust Trade-offs
Figure 3 for Auditing and Generating Synthetic Data with Controllable Trust Trade-offs
Figure 4 for Auditing and Generating Synthetic Data with Controllable Trust Trade-offs
Viaarxiv icon

Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting

Add code
Dec 13, 2022
Figure 1 for Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting
Figure 2 for Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting
Figure 3 for Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting
Figure 4 for Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting
Viaarxiv icon

Knowledge Graph Generation From Text

Add code
Nov 18, 2022
Viaarxiv icon