Picture for Yisen Wang

Yisen Wang

SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation

Add code
Jan 03, 2025
Figure 1 for SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Figure 2 for SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Figure 3 for SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Figure 4 for SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Viaarxiv icon

An Inclusive Theoretical Framework of Robust Supervised Contrastive Loss against Label Noise

Add code
Jan 02, 2025
Viaarxiv icon

Understanding Difficult-to-learn Examples in Contrastive Learning: A Theoretical Framework for Spectral Contrastive Learning

Add code
Jan 02, 2025
Viaarxiv icon

MADE: Graph Backdoor Defense with Masked Unlearning

Add code
Nov 26, 2024
Viaarxiv icon

Understanding the Role of Equivariance in Self-supervised Learning

Add code
Nov 10, 2024
Viaarxiv icon

Dissecting the Failure of Invariant Learning on Graphs

Add code
Nov 05, 2024
Viaarxiv icon

What is Wrong with Perplexity for Long-context Language Modeling?

Add code
Oct 31, 2024
Figure 1 for What is Wrong with Perplexity for Long-context Language Modeling?
Figure 2 for What is Wrong with Perplexity for Long-context Language Modeling?
Figure 3 for What is Wrong with Perplexity for Long-context Language Modeling?
Figure 4 for What is Wrong with Perplexity for Long-context Language Modeling?
Viaarxiv icon

Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness

Add code
Oct 27, 2024
Viaarxiv icon

Can In-context Learning Really Generalize to Out-of-distribution Tasks?

Add code
Oct 13, 2024
Viaarxiv icon

AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation

Add code
Oct 11, 2024
Figure 1 for AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Figure 2 for AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Figure 3 for AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Figure 4 for AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Viaarxiv icon