Abstract:Authorship obfuscation techniques hold the promise of helping people protect their privacy in online communications by automatically rewriting text to hide the identity of the original author. However, obfuscation has been evaluated in narrow settings in the NLP literature and has primarily been addressed with superficial edit operations that can lead to unnatural outputs. In this work, we introduce an automatic text privatization framework that fine-tunes a large language model via reinforcement learning to produce rewrites that balance soundness, sense, and privacy. We evaluate it extensively on a large-scale test set of English Reddit posts by 68k authors composed of short-medium length texts. We study how the performance changes among evaluative conditions including authorial profile length and authorship detection strategy. Our method maintains high text quality according to both automated metrics and human evaluation, and successfully evades several automated authorship attacks.
Abstract:For deep learning practitioners, hyperparameter tuning for optimizing model performance can be a computationally expensive task. Though visualization can help practitioners relate hyperparameter settings to overall model performance, significant manual inspection is still required to guide the hyperparameter settings in the next batch of experiments. In response, we present a streamlined visualization system enabling deep learning practitioners to more efficiently explore, tune, and optimize hyperparameters in a batch of experiments. A key idea is to directly suggest more optimal hyperparameter values using a predictive mechanism. We then integrate this mechanism with current visualization practices for deep learning. Moreover, an analysis on the variance in a selected performance metric in the context of the model hyperparameters shows the impact that certain hyperparameters have on the performance metric. We evaluate the tool with a user study on deep learning model builders, finding that our participants have little issue adopting the tool and working with it as part of their workflow.