Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Barry Chen

Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)

May 20, 2025

Rafael Rivera Soto, Barry Chen, Nicholas Andrews

Figure 1 for Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)

Figure 2 for Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)

Figure 3 for Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)

Figure 4 for Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)

Abstract:Despite considerable progress in the development of machine-text detectors, it has been suggested that the problem is inherently hard, and therefore, that stakeholders should proceed under the assumption that machine-generated text cannot be reliably detected as such. We examine a recent such claim by Nicks et al. (2024) regarding the ease with which language models can be optimized to degrade the performance of machine-text detectors, including detectors not specifically optimized against. We identify a feature space$\unicode{x2013}$the stylistic feature space$\unicode{x2013}$that is robust to such optimization, and show that it may be used to reliably detect samples from language models optimized to prevent detection. Furthermore, we show that even when models are explicitly optimized against stylistic detectors, detection performance remains surprisingly unaffected. We then seek to understand if stylistic detectors are inherently more robust. To study this question, we explore a new paraphrasing approach that simultaneously aims to close the gap between human writing and machine writing in stylistic feature space while avoiding detection using traditional features. We show that when only a single sample is available for detection, this attack is universally effective across all detectors considered, including those that use writing style. However, as the number of samples available for detection grows, the human and machine distributions become distinguishable. This observation encourages us to introduce AURA, a metric that estimates the overlap between human and machine-generated distributions by analyzing how detector performance improves as more samples become available. Overall, our findings underscore previous recommendations to avoid reliance on machine-text detection.

Via

Access Paper or Ask Questions

Are Paraphrases Generated by Large Language Models Invertible?

Oct 29, 2024

Rafael Rivera Soto, Barry Chen, Nicholas Andrews

Abstract:Large language models can produce highly fluent paraphrases while retaining much of the original meaning. While this capability has a variety of helpful applications, it may also be abused by bad actors, for example to plagiarize content or to conceal their identity. This motivates us to consider the problem of paraphrase inversion: given a paraphrased document, attempt to recover the original text. To explore the feasibility of this task, we fine-tune paraphrase inversion models, both with and without additional author-specific context to help guide the inversion process. We explore two approaches to author-specific inversion: one using in-context examples of the target author's writing, and another using learned style representations that capture distinctive features of the author's style. We show that, when starting from paraphrased machine-generated text, we can recover significant portions of the document using a learned inversion model. When starting from human-written text, the variety of source writing styles poses a greater challenge for invertability. However, even when the original tokens can't be recovered, we find the inverted text is stylistically similar to the original, which significantly improves the performance of plagiarism detectors and authorship identification systems that rely on stylistic markers.

Via

Access Paper or Ask Questions

Few-Shot Detection of Machine-Generated Text using Style Representations

Jan 12, 2024

Rafael Rivera Soto, Kailin Koch, Aleem Khan, Barry Chen, Marcus Bishop, Nicholas Andrews

Figure 1 for Few-Shot Detection of Machine-Generated Text using Style Representations

Figure 2 for Few-Shot Detection of Machine-Generated Text using Style Representations

Figure 3 for Few-Shot Detection of Machine-Generated Text using Style Representations

Figure 4 for Few-Shot Detection of Machine-Generated Text using Style Representations

Abstract:The advent of instruction-tuned language models that convincingly mimic human writing poses a significant risk of abuse. For example, such models could be used for plagiarism, disinformation, spam, or phishing. However, such abuse may be counteracted with the ability to detect whether a piece of text was composed by a language model rather than a human. Some previous approaches to this problem have relied on supervised methods trained on corpora of confirmed human and machine-written documents. Unfortunately, model under-specification poses an unavoidable challenge for neural network-based detectors, making them brittle in the face of data shifts, such as the release of further language models producing still more fluent text than the models used to train the detectors. Other previous approaches require access to the models that may have generated a document in question at inference or detection time, which is often impractical. In light of these challenges, we pursue a fundamentally different approach not relying on samples from language models of concern at training time. Instead, we propose to leverage representations of writing style estimated from human-authored text. Indeed, we find that features effective at distinguishing among human authors are also effective at distinguishing human from machine authors, including state of the art large language models like Llama 2, ChatGPT, and GPT-4. Furthermore, given a handful of examples composed by each of several specific language models of interest, our approach affords the ability to predict which model generated a given document.

Via

Access Paper or Ask Questions

Sampled Image Tagging and Retrieval Methods on User Generated Content

Dec 02, 2016

Karl Ni, Kyle Zaragoza, Charles Foster, Carmen Carrano, Barry Chen, Yonas Tesfaye, Alex Gude

Figure 1 for Sampled Image Tagging and Retrieval Methods on User Generated Content

Figure 2 for Sampled Image Tagging and Retrieval Methods on User Generated Content

Figure 3 for Sampled Image Tagging and Retrieval Methods on User Generated Content

Figure 4 for Sampled Image Tagging and Retrieval Methods on User Generated Content

Abstract:Traditional image tagging and retrieval algorithms have limited value as a result of being trained with heavily curated datasets. These limitations are most evident when arbitrary search words are used that do not intersect with training set labels. Weak labels from user generated content (UGC) found in the wild (e.g., Google Photos, FlickR, etc.) have an almost unlimited number of unique words in the metadata tags. Prior work on word embeddings successfully leveraged unstructured text with large vocabularies, and our proposed method seeks to apply similar cost functions to open source imagery. Specifically, we train a deep learning image tagging and retrieval system on large scale, user generated content (UGC) using sampling methods and joint optimization of word embeddings. By using the Yahoo! FlickR Creative Commons (YFCC100M) dataset, such an approach builds robustness to common unstructured data issues that include but are not limited to irrelevant tags, misspellings, multiple languages, polysemy, and tag imbalance. As a result, the final proposed algorithm will not only yield comparable results to state of the art in conventional image tagging, but will enable new capability to train algorithms on large, scale unstructured text in the YFCC100M dataset and outperform cited work in zero-shot capability.

Via

Access Paper or Ask Questions

Large-Scale Deep Learning on the YFCC100M Dataset

Feb 11, 2015

Karl Ni, Roger Pearce, Kofi Boakye, Brian Van Essen, Damian Borth, Barry Chen, Eric Wang

Figure 1 for Large-Scale Deep Learning on the YFCC100M Dataset

Figure 2 for Large-Scale Deep Learning on the YFCC100M Dataset

Figure 3 for Large-Scale Deep Learning on the YFCC100M Dataset

Figure 4 for Large-Scale Deep Learning on the YFCC100M Dataset

Abstract:We present a work-in-progress snapshot of learning with a 15 billion parameter deep learning network on HPC architectures applied to the largest publicly available natural image and video dataset released to-date. Recent advancements in unsupervised deep neural networks suggest that scaling up such networks in both model and training dataset size can yield significant improvements in the learning of concepts at the highest layers. We train our three-layer deep neural network on the Yahoo! Flickr Creative Commons 100M dataset. The dataset comprises approximately 99.2 million images and 800,000 user-created videos from Yahoo's Flickr image and video sharing platform. Training of our network takes eight days on 98 GPU nodes at the High Performance Computing Center at Lawrence Livermore National Laboratory. Encouraging preliminary results and future research directions are presented and discussed.

Via

Access Paper or Ask Questions