Picture for Ruixuan Huang

Ruixuan Huang

Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability

Add code
Apr 30, 2024
Viaarxiv icon

Uncovering Safety Risks in Open-source LLMs through Concept Activation Vector

Add code
Apr 18, 2024
Viaarxiv icon