Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Teddy Furon

INRIA

Task-Agnostic Attacks Against Vision Foundation Models

Mar 05, 2025

Brian Pulfer, Yury Belousov, Vitaliy Kinakh, Teddy Furon, Slava Voloshynovskiy

Abstract:The study of security in machine learning mainly focuses on downstream task-specific attacks, where the adversarial example is obtained by optimizing a loss function specific to the downstream task. At the same time, it has become standard practice for machine learning practitioners to adopt publicly available pre-trained vision foundation models, effectively sharing a common backbone architecture across a multitude of applications such as classification, segmentation, depth estimation, retrieval, question-answering and more. The study of attacks on such foundation models and their impact to multiple downstream tasks remains vastly unexplored. This work proposes a general framework that forges task-agnostic adversarial examples by maximally disrupting the feature representation obtained with foundation models. We extensively evaluate the security of the feature representations obtained by popular vision foundation models by measuring the impact of this attack on multiple downstream tasks and its transferability between models.

Via

Access Paper or Ask Questions

Watermark Anything with Localized Messages

Nov 11, 2024

Tom Sander, Pierre Fernandez, Alain Durmus, Teddy Furon, Matthijs Douze

Figure 1 for Watermark Anything with Localized Messages

Figure 2 for Watermark Anything with Localized Messages

Figure 3 for Watermark Anything with Localized Messages

Figure 4 for Watermark Anything with Localized Messages

Abstract:Image watermarking methods are not tailored to handle small watermarked areas. This restricts applications in real-world scenarios where parts of the image may come from different sources or have been edited. We introduce a deep-learning model for localized image watermarking, dubbed the Watermark Anything Model (WAM). The WAM embedder imperceptibly modifies the input image, while the extractor segments the received image into watermarked and non-watermarked areas and recovers one or several hidden messages from the areas found to be watermarked. The models are jointly trained at low resolution and without perceptual constraints, then post-trained for imperceptibility and multiple watermarks. Experiments show that WAM is competitive with state-of-the art methods in terms of imperceptibility and robustness, especially against inpainting and splicing, even on high-resolution images. Moreover, it offers new capabilities: WAM can locate watermarked areas in spliced images and extract distinct 32-bit messages with less than 1 bit error from multiple small regions - no larger than 10% of the image surface - even for small $256\times 256$ images.

* Under review. Code at https://github.com/facebookresearch/watermark-anything

Via

Access Paper or Ask Questions

Evaluation of Security of ML-based Watermarking: Copy and Removal Attacks

Sep 26, 2024

Vitaliy Kinakh, Brian Pulfer, Yury Belousov, Pierre Fernandez, Teddy Furon, Slava Voloshynovskiy

Figure 1 for Evaluation of Security of ML-based Watermarking: Copy and Removal Attacks

Figure 2 for Evaluation of Security of ML-based Watermarking: Copy and Removal Attacks

Figure 3 for Evaluation of Security of ML-based Watermarking: Copy and Removal Attacks

Figure 4 for Evaluation of Security of ML-based Watermarking: Copy and Removal Attacks

Abstract:The vast amounts of digital content captured from the real world or AI-generated media necessitate methods for copyright protection, traceability, or data provenance verification. Digital watermarking serves as a crucial approach to address these challenges. Its evolution spans three generations: handcrafted, autoencoder-based, and foundation model based methods. %Its evolution spans three generations: handcrafted methods, autoencoder-based schemes, and methods based on foundation models. While the robustness of these systems is well-documented, the security against adversarial attacks remains underexplored. This paper evaluates the security of foundation models' latent space digital watermarking systems that utilize adversarial embedding techniques. A series of experiments investigate the security dimensions under copy and removal attacks, providing empirical insights into these systems' vulnerabilities. All experimental codes and results are available at https://github.com/vkinakh/ssl-watermarking-attacks}{repository

Via

Access Paper or Ask Questions

SWIFT: Semantic Watermarking for Image Forgery Thwarting

Jul 26, 2024

Gautier Evennou, Vivien Chappelier, Ewa Kijak, Teddy Furon

Abstract:This paper proposes a novel approach towards image authentication and tampering detection by using watermarking as a communication channel for semantic information. We modify the HiDDeN deep-learning watermarking architecture to embed and extract high-dimensional real vectors representing image captions. Our method improves significantly robustness on both malign and benign edits. We also introduce a local confidence metric correlated with Message Recovery Rate, enhancing the method's practical applicability. This approach bridges the gap between traditional watermarking and passive forensic methods, offering a robust solution for image integrity verification.

* Code will be released

Via

Access Paper or Ask Questions

Watermarking Makes Language Models Radioactive

Feb 22, 2024

Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, Teddy Furon

Figure 1 for Watermarking Makes Language Models Radioactive

Figure 2 for Watermarking Makes Language Models Radioactive

Figure 3 for Watermarking Makes Language Models Radioactive

Figure 4 for Watermarking Makes Language Models Radioactive

Abstract:This paper investigates the radioactivity of LLM-generated texts, i.e. whether it is possible to detect that such input was used as training data. Conventional methods like membership inference can carry out this detection with some level of accuracy. We show that watermarked training data leaves traces easier to detect and much more reliable than membership inference. We link the contamination level to the watermark robustness, its proportion in the training set, and the fine-tuning process. We notably demonstrate that training on watermarked synthetic instructions can be detected with high confidence (p-value < 1e-5) even when as little as 5% of training text is watermarked. Thus, LLM watermarking, originally designed for detecting machine-generated text, gives the ability to easily identify if the outputs of a watermarked LLM were used to fine-tune another LLM.

Via

Access Paper or Ask Questions

Proactive Detection of Voice Cloning with Localized Watermarking

Jan 30, 2024

Robin San Roman, Pierre Fernandez, Alexandre Défossez, Teddy Furon, Tuan Tran, Hady Elsahar

Abstract:In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning. We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level, and a novel perceptual loss inspired by auditory masking, that enables AudioSeal to achieve better imperceptibility. AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics. Additionally, AudioSeal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed - achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.

* Code at https://github.com/facebookresearch/audioseal

Via

Access Paper or Ask Questions

Three Bricks to Consolidate Watermarks for Large Language Models

Jul 26, 2023

Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, Teddy Furon

Abstract:The task of discerning between generated and natural texts is increasingly challenging. In this context, watermarking emerges as a promising technique for ascribing generated text to a specific model. It alters the sampling generation process so as to leave an invisible trace in the generated output, facilitating later detection. This research consolidates watermarks for large language models based on three theoretical and empirical considerations. First, we introduce new statistical tests that offer robust theoretical guarantees which remain valid even at low false-positive rates (less than 10$^{\text{-6}}$). Second, we compare the effectiveness of watermarks using classical benchmarks in the field of natural language processing, gaining insights into their real-world applicability. Third, we develop advanced detection schemes for scenarios where access to the LLM is available, as well as multi-bit watermarking.

* Webpage at https://pierrefdz.github.io/publications/threebricks/

Via

Access Paper or Ask Questions

How to choose your best allies for a transferable attack?

Apr 05, 2023

Thibault Maho, Seyed-Mohsen Moosavi-Dezfooli, Teddy Furon

Figure 1 for How to choose your best allies for a transferable attack?

Figure 2 for How to choose your best allies for a transferable attack?

Figure 3 for How to choose your best allies for a transferable attack?

Figure 4 for How to choose your best allies for a transferable attack?

Abstract:The transferability of adversarial examples is a key issue in the security of deep neural networks. The possibility of an adversarial example crafted for a source model fooling another targeted model makes the threat of adversarial attacks more realistic. Measuring transferability is a crucial problem, but the Attack Success Rate alone does not provide a sound evaluation. This paper proposes a new methodology for evaluating transferability by putting distortion in a central position. This new tool shows that transferable attacks may perform far worse than a black box attack if the attacker randomly picks the source model. To address this issue, we propose a new selection mechanism, called FiT, which aims at choosing the best source model with only a few preliminary queries to the target. Our experimental results show that FiT is highly effective at selecting the best source model for multiple scenarios such as single-model attacks, ensemble-model attacks and multiple attacks (Code available at: https://github.com/t-maho/transferability_measure_fit).

Via

Access Paper or Ask Questions

The Stable Signature: Rooting Watermarks in Latent Diffusion Models

Mar 27, 2023

Pierre Fernandez, Guillaume Couairon, Hervé Jégou, Matthijs Douze, Teddy Furon

Abstract:Generative image modeling enables a wide range of applications but raises ethical concerns about responsible deployment. This paper introduces an active strategy combining image watermarking and Latent Diffusion Models. The goal is for all generated images to conceal an invisible watermark allowing for future detection and/or identification. The method quickly fine-tunes the latent decoder of the image generator, conditioned on a binary signature. A pre-trained watermark extractor recovers the hidden signature from any generated image and a statistical test then determines whether it comes from the generative model. We evaluate the invisibility and robustness of the watermarks on a variety of generation tasks, showing that Stable Signature works even after the images are modified. For instance, it detects the origin of an image generated from a text prompt, then cropped to keep $10\%$ of the content, with $90$+$\%$ accuracy at a false positive rate below 10$^{-6}$.

* Website at https://pierrefdz.github.io/publications/stablesignature

Via

Access Paper or Ask Questions

Mixer: DNN Watermarking using Image Mixup

Dec 06, 2022

Kassem Kallas, Teddy Furon

Figure 1 for Mixer: DNN Watermarking using Image Mixup

Figure 2 for Mixer: DNN Watermarking using Image Mixup

Abstract:It is crucial to protect the intellectual property rights of DNN models prior to their deployment. The DNN should perform two main tasks: its primary task and watermarking task. This paper proposes a lightweight, reliable, and secure DNN watermarking that attempts to establish strong ties between these two tasks. The samples triggering the watermarking task are generated using image Mixup either from training or testing samples. This means that there is an infinity of triggers not limited to the samples used to embed the watermark in the model at training. The extensive experiments on image classification models for different datasets as well as exposing them to a variety of attacks, show that the proposed watermarking provides protection with an adequate level of security and robustness.

* arXiv admin note: text overlap with arXiv:2206.11024

Via

Access Paper or Ask Questions