Abstract:Mass surveillance systems for voice over IP (VoIP) conversations pose a huge risk to privacy. These automated systems use learning models to analyze conversations, and upon detecting calls that involve specific topics, route them to a human agent. In this study, we present an adversarial learning-based framework for privacy protection for VoIP conversations. We present a novel algorithm that finds a universal adversarial perturbation (UAP), which, when added to the audio stream, prevents an eavesdropper from automatically detecting the conversation's topic. As shown in our experiments, the UAP is agnostic to the speaker or audio length, and its volume can be changed in real-time, as needed. In a real-world demonstration, we use a Teensy microcontroller that acts as an external microphone and adds the UAP to the audio in real-time. We examine different speakers, VoIP applications (Skype, Zoom), audio lengths, and speech-to-text models (Deep Speech, Kaldi). Our results in the real world suggest that our approach is a feasible solution for privacy protection.