Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shang Shang

Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

May 07, 2024

Shang Shang, Xinqiang Zhao, Zhongjiang Yao, Yepeng Yao, Liya Su, Zijing Fan, Xiaodan Zhang, Zhengwei Jiang

Figure 1 for Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

Figure 2 for Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

Figure 3 for Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

Figure 4 for Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

Abstract:To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts.This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content security measures. We detail two implementations under this framework: "Obscure Intention" and "Create Ambiguity", which manipulate query complexity and ambiguity to evade malicious intent detection effectively. We empirically validate the effectiveness of the IntentObfuscator method across several models, including ChatGPT-3.5, ChatGPT-4, Qwen and Baichuan, achieving an average jailbreak success rate of 69.21\%. Notably, our tests on ChatGPT-3.5, which claims 100 million weekly active users, achieved a remarkable success rate of 83.65\%. We also extend our validation to diverse types of sensitive content like graphic violence, racism, sexism, political sensitivity, cybersecurity threats, and criminal skills, further proving the substantial impact of our findings on enhancing 'Red Team' strategies against LLM content security frameworks.

Via

Access Paper or Ask Questions

The Application of Differential Privacy for Rank Aggregation: Privacy and Accuracy

Sep 24, 2014

Shang Shang, Tiance Wang, Paul Cuff, Sanjeev Kulkarni

Figure 1 for The Application of Differential Privacy for Rank Aggregation: Privacy and Accuracy

Figure 2 for The Application of Differential Privacy for Rank Aggregation: Privacy and Accuracy

Figure 3 for The Application of Differential Privacy for Rank Aggregation: Privacy and Accuracy

Abstract:The potential risk of privacy leakage prevents users from sharing their honest opinions on social platforms. This paper addresses the problem of privacy preservation if the query returns the histogram of rankings. The framework of differential privacy is applied to rank aggregation. The error probability of the aggregated ranking is analyzed as a result of noise added in order to achieve differential privacy. Upper bounds on the error rates for any positional ranking rule are derived under the assumption that profiles are uniformly distributed. Simulation results are provided to validate the probabilistic analysis.

* Fusion 2014

Via

Access Paper or Ask Questions

A Random Walk Based Model Incorporating Social Information for Recommendations

May 17, 2013

Shang Shang, Sanjeev R. Kulkarni, Paul W. Cuff, Pan Hui

Figure 1 for A Random Walk Based Model Incorporating Social Information for Recommendations

Figure 2 for A Random Walk Based Model Incorporating Social Information for Recommendations

Figure 3 for A Random Walk Based Model Incorporating Social Information for Recommendations

Figure 4 for A Random Walk Based Model Incorporating Social Information for Recommendations

Abstract:Collaborative filtering (CF) is one of the most popular approaches to build a recommendation system. In this paper, we propose a hybrid collaborative filtering model based on a Makovian random walk to address the data sparsity and cold start problems in recommendation systems. More precisely, we construct a directed graph whose nodes consist of items and users, together with item content, user profile and social network information. We incorporate user's ratings into edge settings in the graph model. The model provides personalized recommendations and predictions to individuals and groups. The proposed algorithms are evaluated on MovieLens and Epinions datasets. Experimental results show that the proposed methods perform well compared with other graph-based methods, especially in the cold start case.

* 2012 IEEE Machine Learning for Signal Processing Workshop (MLSP), 6 pages

Via

Access Paper or Ask Questions

Wisdom of the Crowd: Incorporating Social Influence in Recommendation Models

May 17, 2013

Shang Shang, Pan Hui, Sanjeev R. Kulkarni, Paul W. Cuff

Figure 1 for Wisdom of the Crowd: Incorporating Social Influence in Recommendation Models

Figure 2 for Wisdom of the Crowd: Incorporating Social Influence in Recommendation Models

Abstract:Recommendation systems have received considerable attention recently. However, most research has been focused on improving the performance of collaborative filtering (CF) techniques. Social networks, indispensably, provide us extra information on people's preferences, and should be considered and deployed to improve the quality of recommendations. In this paper, we propose two recommendation models, for individuals and for groups respectively, based on social contagion and social influence network theory. In the recommendation model for individuals, we improve the result of collaborative filtering prediction with social contagion outcome, which simulates the result of information cascade in the decision-making process. In the recommendation model for groups, we apply social influence network theory to take interpersonal influence into account to form a settled pattern of disagreement, and then aggregate opinions of group members. By introducing the concept of susceptibility and interpersonal influence, the settled rating results are flexible, and inclined to members whose ratings are "essential".

* HotPost 2011, 6 pages

Via

Access Paper or Ask Questions