Abstract:With the proliferation of AI agents in various domains, protecting the ownership of AI models has become crucial due to the significant investment in their development. Unauthorized use and illegal distribution of these models pose serious threats to intellectual property, necessitating effective copyright protection measures. Model watermarking has emerged as a key technique to address this issue, embedding ownership information within models to assert rightful ownership during copyright disputes. This paper presents several contributions to model watermarking: a self-authenticating black-box watermarking protocol using hash techniques, a study on evidence forgery attacks using adversarial perturbations, a proposed defense involving a purification step to counter adversarial attacks, and a purification-agnostic proxy learning method to enhance watermark reliability and model performance. Experimental results demonstrate the effectiveness of these approaches in improving the security, reliability, and performance of watermarked models.
Abstract:The emergence of large language models (LLMs), such as Generative Pre-trained Transformer 4 (GPT-4) used by ChatGPT, has profoundly impacted the academic and broader community. While these models offer numerous advantages in terms of revolutionizing work and study methods, they have also garnered significant attention due to their potential negative consequences. One example is generating academic reports or papers with little to no human contribution. Consequently, researchers have focused on developing detectors to address the misuse of LLMs. However, most existing methods prioritize achieving higher accuracy on restricted datasets, neglecting the crucial aspect of generalizability. This limitation hinders their practical application in real-life scenarios where reliability is paramount. In this paper, we present a comprehensive analysis of the impact of prompts on the text generated by LLMs and highlight the potential lack of robustness in one of the current state-of-the-art GPT detectors. To mitigate these issues concerning the misuse of LLMs in academic writing, we propose a reference-based Siamese detector named Synthetic-Siamese which takes a pair of texts, one as the inquiry and the other as the reference. Our method effectively addresses the lack of robustness of previous detectors (OpenAI detector and DetectGPT) and significantly improves the baseline performances in realistic academic writing scenarios by approximately 67% to 95%.
Abstract:Authentication mechanisms are at the forefront of defending the world from various types of cybercrime. Steganography can serve as an authentication solution by embedding a digital signature into a carrier object to ensure the integrity of the object and simultaneously lighten the burden of metadata management. However, steganographic distortion, albeit generally imperceptible to human sensory systems, might be inadmissible in fidelity-sensitive situations. This has led to the concept of reversible steganography. A fundamental element of reversible steganography is predictive analytics, for which powerful neural network models have been effectively deployed. As another core aspect, contemporary reversible steganographic coding is based primarily on heuristics and therefore worth further study. While attempts have been made to realise automatic coding with neural networks, perfect reversibility is still unreachable via such an unexplainable intelligent machinery. Instead of relying on deep learning, we aim to derive an optimal coding by means of mathematical optimisation. In this study, we formulate reversible steganographic coding as a nonlinear discrete optimisation problem with a logarithmic capacity constraint and a quadratic distortion objective. Linearisation techniques are developed to enable mixed-integer linear programming. Experimental results validate the near-optimality of the proposed optimisation algorithm benchmarked against a brute-force method.
Abstract:Artificial neural networks have advanced the frontiers of reversible steganography. The core strength of neural networks is the ability to render accurate predictions for a bewildering variety of data. Residual modulation is recognised as the most advanced reversible steganographic algorithm for digital images and the pivot of which is the predictive module. The function of this module is to predict pixel intensity given some pixel-wise contextual information. This task can be perceived as a low-level vision problem and hence neural networks for addressing a similar class of problems can be deployed. On top of the prior art, this paper analyses the predictive uncertainty and endows the predictive module with the option to abstain when encountering a high level of uncertainty. Uncertainty analysis can be formulated as a pixel-level binary classification problem and tackled by both supervised and unsupervised learning. In contrast to handcrafted statistical analytics, learning-based analytics can learn to follow some general statistical principles and simultaneously adapt to a specific predictor. Experimental results show that steganographic performance can be remarkably improved by adaptively filtering out the unpredictable regions with the learning-based uncertainty analysers.
Abstract:Recent advances in deep learning have led to a paradigm shift in reversible steganography. A fundamental pillar of reversible steganography is predictive modelling which can be realised via deep neural networks. However, non-trivial errors exist in inferences about some out-of-distribution and noisy data. In view of this issue, we propose to consider uncertainty in predictive models based upon a theoretical framework of Bayesian deep learning. Bayesian neural networks can be regarded as self-aware machinery; that is, a machine that knows its own limitations. To quantify uncertainty, we approximate the posterior predictive distribution through Monte Carlo sampling with stochastic forward passes. We further show that predictive uncertainty can be disentangled into aleatoric and epistemic uncertainties and these quantities can be learnt in an unsupervised manner. Experimental results demonstrate an improvement delivered by Bayesian uncertainty analysis upon steganographic capacity-distortion performance.
Abstract:Deep-learning\textendash{centric} reversible steganography has emerged as a promising research paradigm. A direct way of applying deep learning to reversible steganography is to construct a pair of encoder and decoder, whose parameters are trained jointly, thereby learning the steganographic system as a whole. This end-to-end framework, however, falls short of the reversibility requirement because it is difficult for this kind of monolithic system, as a black box, to create or duplicate intricate reversible mechanisms. In response to this issue, a recent approach is to carve up the steganographic system and work on modules independently. In particular, neural networks are deployed in an analytics module to learn the data distribution, while an established mechanism is called upon to handle the remaining tasks. In this paper, we investigate the modular framework and deploy deep neural networks in a reversible steganographic scheme referred to as prediction-error modulation, in which an analytics module serves the purpose of pixel intensity prediction. The primary focus of this study is on deep-learning\textendash{based} context-aware pixel intensity prediction. We address the unsolved issues reported in related literature, including the impact of pixel initialisation on prediction accuracy and the influence of uncertainty propagation in dual-layer embedding. Furthermore, we establish a connection between context-aware pixel intensity prediction and low-level computer vision and analyse the performance of several advanced neural networks.