Picture for Bang An

Bang An

Alignment at Pre-training! Towards Native Alignment for Arabic LLMs

Add code
Dec 04, 2024
Viaarxiv icon

Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine

Add code
Nov 20, 2024
Figure 1 for Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine
Figure 2 for Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine
Figure 3 for Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine
Figure 4 for Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine
Viaarxiv icon

GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment

Add code
Oct 10, 2024
Figure 1 for GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Figure 2 for GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Figure 3 for GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Figure 4 for GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Viaarxiv icon

SAFLEX: Self-Adaptive Augmentation via Feature Label Extrapolation

Add code
Oct 03, 2024
Viaarxiv icon

Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models

Add code
Sep 01, 2024
Viaarxiv icon

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

Add code
Jul 24, 2024
Viaarxiv icon

Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity

Add code
Jun 21, 2024
Viaarxiv icon

Referee-Meta-Learning for Fast Adaptation of Locational Fairness

Add code
Feb 20, 2024
Viaarxiv icon

Benchmarking the Robustness of Image Watermarks

Add code
Jan 22, 2024
Viaarxiv icon

Explore Spurious Correlations at the Concept Level in Language Models for Text Classification

Add code
Nov 15, 2023
Viaarxiv icon