Picture for Dongxia Wang

Dongxia Wang

Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks

Add code
Aug 18, 2024
Figure 1 for Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Figure 2 for Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Figure 3 for Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Figure 4 for Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Viaarxiv icon

S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models

Add code
May 28, 2024
Viaarxiv icon

Does Knowledge Graph Really Matter for Recommender Systems?

Add code
Apr 04, 2024
Viaarxiv icon

FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution

Add code
Feb 06, 2024
Viaarxiv icon

FairRec: Fairness Testing for Deep Recommender Systems

Add code
Apr 14, 2023
Viaarxiv icon

Stability of Weighted Majority Voting under Estimated Weights

Add code
Jul 13, 2022
Figure 1 for Stability of Weighted Majority Voting under Estimated Weights
Figure 2 for Stability of Weighted Majority Voting under Estimated Weights
Figure 3 for Stability of Weighted Majority Voting under Estimated Weights
Figure 4 for Stability of Weighted Majority Voting under Estimated Weights
Viaarxiv icon