Picture for Zirui He

Zirui He

SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models

Add code
Feb 17, 2025
Viaarxiv icon

Mitigating Shortcuts in Language Models with Soft Label Encoding

Add code
Sep 17, 2023
Figure 1 for Mitigating Shortcuts in Language Models with Soft Label Encoding
Figure 2 for Mitigating Shortcuts in Language Models with Soft Label Encoding
Figure 3 for Mitigating Shortcuts in Language Models with Soft Label Encoding
Figure 4 for Mitigating Shortcuts in Language Models with Soft Label Encoding
Viaarxiv icon