Picture for Bang An

Bang An

GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment

Add code
Oct 10, 2024
Figure 1 for GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Figure 2 for GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Figure 3 for GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Figure 4 for GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Viaarxiv icon

SAFLEX: Self-Adaptive Augmentation via Feature Label Extrapolation

Add code
Oct 03, 2024
Viaarxiv icon

Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models

Add code
Sep 01, 2024
Viaarxiv icon

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

Add code
Jul 24, 2024
Viaarxiv icon

Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity

Add code
Jun 21, 2024
Viaarxiv icon

Referee-Meta-Learning for Fast Adaptation of Locational Fairness

Add code
Feb 20, 2024
Viaarxiv icon

Benchmarking the Robustness of Image Watermarks

Add code
Jan 22, 2024
Viaarxiv icon

Explore Spurious Correlations at the Concept Level in Language Models for Text Classification

Add code
Nov 15, 2023
Viaarxiv icon

C-Disentanglement: Discovering Causally-Independent Generative Factors under an Inductive Bias of Confounder

Add code
Oct 26, 2023
Viaarxiv icon

AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models

Add code
Oct 23, 2023
Viaarxiv icon