Picture for Felix Friedrich

Felix Friedrich

Navigating Shortcuts, Spurious Correlations, and Confounders: From Origins via Detection to Mitigation

Add code
Dec 06, 2024
Viaarxiv icon

SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs

Add code
Nov 11, 2024
Viaarxiv icon

LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment

Add code
Jun 07, 2024
Viaarxiv icon

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

Add code
Apr 06, 2024
Figure 1 for ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
Figure 2 for ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
Figure 3 for ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
Figure 4 for ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
Viaarxiv icon

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Add code
Mar 30, 2024
Figure 1 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Figure 2 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Figure 3 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Figure 4 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Viaarxiv icon

Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You

Add code
Jan 31, 2024
Viaarxiv icon

LEDITS++: Limitless Image Editing using Text-to-Image Models

Add code
Nov 28, 2023
Figure 1 for LEDITS++: Limitless Image Editing using Text-to-Image Models
Figure 2 for LEDITS++: Limitless Image Editing using Text-to-Image Models
Figure 3 for LEDITS++: Limitless Image Editing using Text-to-Image Models
Figure 4 for LEDITS++: Limitless Image Editing using Text-to-Image Models
Viaarxiv icon

Learning by Self-Explaining

Add code
Sep 15, 2023
Viaarxiv icon

Learning to Intervene on Concept Bottlenecks

Add code
Aug 25, 2023
Viaarxiv icon

Mitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the World's Ugliness?

Add code
May 28, 2023
Viaarxiv icon