Abstract:Recent breakthroughs in the field of deep learning have led to advancements in a broad spectrum of tasks in computer vision, audio processing, natural language processing and other areas. In most instances where these tasks are deployed in real-world scenarios, the models used in them have been shown to be susceptible to adversarial attacks, making it imperative for us to address the challenge of their adversarial robustness. Existing techniques for adversarial robustness fall into three broad categories: defensive distillation techniques, adversarial training techniques, and randomized or non-deterministic model based techniques. In this paper, we propose a novel neural network paradigm that falls under the category of randomized models for adversarial robustness, but differs from all existing techniques under this category in that it models each parameter of the network as a statistical distribution with learnable parameters. We show experimentally that this framework is highly robust to a variety of white-box and black-box adversarial attacks, while preserving the task-specific performance of the traditional neural network model.