Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Buyun Liang

KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs

Feb 05, 2025

Buyun Liang, Kwan Ho Ryan Chan, Darshan Thaker, Jinqi Luo, René Vidal

Abstract:Jailbreak attacks exploit specific prompts to bypass LLM safeguards, causing the LLM to generate harmful, inappropriate, and misaligned content. Current jailbreaking methods rely heavily on carefully designed system prompts and numerous queries to achieve a single successful attack, which is costly and impractical for large-scale red-teaming. To address this challenge, we propose to distill the knowledge of an ensemble of SOTA attackers into a single open-source model, called Knowledge-Distilled Attacker (KDA), which is finetuned to automatically generate coherent and diverse attack prompts without the need for meticulous system prompt engineering. Compared to existing attackers, KDA achieves higher attack success rates and greater cost-time efficiency when targeting multiple SOTA open-source and commercial black-box LLMs. Furthermore, we conducted a quantitative diversity analysis of prompts generated by baseline methods and KDA, identifying diverse and ensemble attacks as key factors behind KDA's effectiveness and efficiency.

Via

Access Paper or Ask Questions

Optimization and Optimizers for Adversarial Robustness

Mar 23, 2023

Hengyue Liang, Buyun Liang, Le Peng, Ying Cui, Tim Mitchell, Ju Sun

Figure 1 for Optimization and Optimizers for Adversarial Robustness

Figure 2 for Optimization and Optimizers for Adversarial Robustness

Figure 3 for Optimization and Optimizers for Adversarial Robustness

Figure 4 for Optimization and Optimizers for Adversarial Robustness

Abstract:Empirical robustness evaluation (RE) of deep learning models against adversarial perturbations entails solving nontrivial constrained optimization problems. Existing numerical algorithms that are commonly used to solve them in practice predominantly rely on projected gradient, and mostly handle perturbations modeled by the $\ell_1$, $\ell_2$ and $\ell_\infty$ distances. In this paper, we introduce a novel algorithmic framework that blends a general-purpose constrained-optimization solver PyGRANSO with Constraint Folding (PWCF), which can add more reliability and generality to the state-of-the-art RE packages, e.g., AutoAttack. Regarding reliability, PWCF provides solutions with stationarity measures and feasibility tests to assess the solution quality. For generality, PWCF can handle perturbation models that are typically inaccessible to the existing projected gradient methods; the main requirement is the distance metric to be almost everywhere differentiable. Taking advantage of PWCF and other existing numerical algorithms, we further explore the distinct patterns in the solutions found for solving these optimization problems using various combinations of losses, perturbation models, and optimization algorithms. We then discuss the implications of these patterns on the current robustness evaluation and adversarial training.

Via

Access Paper or Ask Questions

Predicting the Future of the CMS Detector: Crystal Radiation Damage and Machine Learning at the LHC

Mar 23, 2023

Bhargav Joshi, Taihui Li, Buyun Liang, Roger Rusack, Ju Sun

Abstract:The 75,848 lead tungstate crystals in CMS experiment at the CERN Large Hadron Collider are used to measure the energy of electrons and photons produced in the proton-proton collisions. The optical transparency of the crystals degrades slowly with radiation dose due to the beam-beam collisions. The transparency of each crystal is monitored with a laser monitoring system that tracks changes in the optical properties of the crystals due to radiation from the collision products. Predicting the optical transparency of the crystals, both in the short-term and in the long-term, is a critical task for the CMS experiment. We describe here the public data release, following FAIR principles, of the crystal monitoring data collected by the CMS Collaboration between 2016 and 2018. Besides describing the dataset and its access, the problems that can be addressed with it are described, as well as an example solution based on a Long Short-Term Memory neural network developed to predict future behavior of the crystals.

Via

Access Paper or Ask Questions

NCVX: A General-Purpose Optimization Solver for Constrained Machine and Deep Learning

Oct 03, 2022

Buyun Liang, Tim Mitchell, Ju Sun

Abstract:Imposing explicit constraints is relatively new but increasingly pressing in deep learning, stimulated by, e.g., trustworthy AI that performs robust optimization over complicated perturbation sets and scientific applications that need to respect physical laws and constraints. However, it can be hard to reliably solve constrained deep learning problems without optimization expertise. The existing deep learning frameworks do not admit constraints. General-purpose optimization packages can handle constraints but do not perform auto-differentiation and have trouble dealing with nonsmoothness. In this paper, we introduce a new software package called NCVX, whose initial release contains the solver PyGRANSO, a PyTorch-enabled general-purpose optimization package for constrained machine/deep learning problems, the first of its kind. NCVX inherits auto-differentiation, GPU acceleration, and tensor variables from PyTorch, and is built on freely available and widely used open-source frameworks. NCVX is available at https://ncvx.org, with detailed documentation and numerous examples from machine/deep learning and other fields.

* Submitted to the 14th International OPT Workshop on Optimization for Machine Learning. arXiv admin note: text overlap with arXiv:2111.13984

Via

Access Paper or Ask Questions

Optimization for Robustness Evaluation beyond $\ell_p$ Metrics

Oct 02, 2022

Hengyue Liang, Buyun Liang, Ying Cui, Tim Mitchell, Ju Sun

$Figure 1 for Optimization for Robustness Evaluation beyond $\ell_p$ Metrics$

$Figure 2 for Optimization for Robustness Evaluation beyond $\ell_p$ Metrics$

$Figure 3 for Optimization for Robustness Evaluation beyond $\ell_p$ Metrics$

$Figure 4 for Optimization for Robustness Evaluation beyond $\ell_p$ Metrics$

Abstract:Empirical evaluation of deep learning models against adversarial attacks entails solving nontrivial constrained optimization problems. Popular algorithms for solving these constrained problems rely on projected gradient descent (PGD) and require careful tuning of multiple hyperparameters. Moreover, PGD can only handle $\ell_1$, $\ell_2$, and $\ell_\infty$ attack models due to the use of analytical projectors. In this paper, we introduce a novel algorithmic framework that blends a general-purpose constrained-optimization solver PyGRANSO, With Constraint-Folding (PWCF), to add reliability and generality to robustness evaluation. PWCF 1) finds good-quality solutions without the need of delicate hyperparameter tuning, and 2) can handle general attack models, e.g., general $\ell_p$ ($p \geq 0$) and perceptual attacks, which are inaccessible to PGD-based algorithms.

* 5 pages, 1 figure, 3 tables, submitted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023) and the 14th International OPT Workshop on Optimization for Machine Learning

Via

Access Paper or Ask Questions

NCVX: A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning

Nov 27, 2021

Buyun Liang, Ju Sun

Figure 1 for NCVX: A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning

Abstract:Optimizing nonconvex (NCVX) problems, especially those nonsmooth (NSMT) and constrained (CSTR), is an essential part of machine learning and deep learning. But it is hard to reliably solve this type of problems without optimization expertise. Existing general-purpose NCVX optimization packages are powerful, but typically cannot handle nonsmoothness. GRANSO is among the first packages targeting NCVX, NSMT, CSTR problems. However, it has several limitations such as the lack of auto-differentiation and GPU acceleration, which preclude the potential broad deployment by non-experts. To lower the technical barrier for the machine learning community, we revamp GRANSO into a user-friendly and scalable python package named NCVX, featuring auto-differentiation, GPU acceleration, tensor input, scalable QP solver, and zero dependency on proprietary packages. As a highlight, NCVX can solve general CSTR deep learning problems, the first of its kind. NCVX is available at https://ncvx.org, with detailed documentation and numerous examples from machine learning and other fields.

* NCVX is available at https://ncvx.org

Via

Access Paper or Ask Questions