Picture for Federico Bianchi

Federico Bianchi

Belief in the Machine: Investigating Epistemological Blind Spots of Language Models

Add code
Oct 28, 2024
Viaarxiv icon

h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment

Add code
Aug 09, 2024
Figure 1 for h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment
Figure 2 for h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment
Figure 3 for h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment
Figure 4 for h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment
Viaarxiv icon

TextGrad: Automatic "Differentiation" via Text

Add code
Jun 11, 2024
Figure 1 for TextGrad: Automatic "Differentiation" via Text
Figure 2 for TextGrad: Automatic "Differentiation" via Text
Figure 3 for TextGrad: Automatic "Differentiation" via Text
Figure 4 for TextGrad: Automatic "Differentiation" via Text
Viaarxiv icon

Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content

Add code
Feb 21, 2024
Viaarxiv icon

How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis

Add code
Feb 08, 2024
Viaarxiv icon

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions

Add code
Sep 25, 2023
Viaarxiv icon

Vehicle-to-Grid and ancillary services:a profitability analysis under uncertainty

Add code
Sep 20, 2023
Viaarxiv icon

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

Add code
Aug 02, 2023
Viaarxiv icon

E Pluribus Unum: Guidelines on Multi-Objective Evaluation of Recommender Systems

Add code
Apr 20, 2023
Viaarxiv icon

EvalRS 2023. Well-Rounded Recommender Systems For Real-World Deployments

Add code
Apr 19, 2023
Viaarxiv icon