Picture for Leo Schwinn

Leo Schwinn

LLM-Safety Evaluations Lack Robustness

Add code
Mar 04, 2025
Viaarxiv icon

Joint Out-of-Distribution Filtering and Data Discovery Active Learning

Add code
Mar 04, 2025
Viaarxiv icon

A generative approach to LLM harmfulness detection with special red flag tokens

Add code
Feb 22, 2025
Viaarxiv icon

Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives

Add code
Feb 17, 2025
Viaarxiv icon

Extracting Unlearned Information from LLMs with Activation Steering

Add code
Nov 04, 2024
Figure 1 for Extracting Unlearned Information from LLMs with Activation Steering
Figure 2 for Extracting Unlearned Information from LLMs with Activation Steering
Figure 3 for Extracting Unlearned Information from LLMs with Activation Steering
Figure 4 for Extracting Unlearned Information from LLMs with Activation Steering
Viaarxiv icon

A Probabilistic Perspective on Unlearning and Alignment for Large Language Models

Add code
Oct 04, 2024
Viaarxiv icon

Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting

Add code
Oct 03, 2024
Viaarxiv icon

Caption-Driven Explorations: Aligning Image and Text Embeddings through Human-Inspired Foveated Vision

Add code
Aug 19, 2024
Figure 1 for Caption-Driven Explorations: Aligning Image and Text Embeddings through Human-Inspired Foveated Vision
Figure 2 for Caption-Driven Explorations: Aligning Image and Text Embeddings through Human-Inspired Foveated Vision
Viaarxiv icon

Relaxing Graph Transformers for Adversarial Attacks

Add code
Jul 16, 2024
Viaarxiv icon

Large-Scale Dataset Pruning in Adversarial Training through Data Importance Extrapolation

Add code
Jun 19, 2024
Viaarxiv icon