Picture for Mehrnaz Mohfakhami

Mehrnaz Mohfakhami

A generative approach to LLM harmfulness detection with special red flag tokens

Add code
Feb 22, 2025
Viaarxiv icon