Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Divyansh Pareek

Understanding the Gains from Repeated Self-Distillation

Jul 05, 2024

Divyansh Pareek, Simon S. Du, Sewoong Oh

Figure 1 for Understanding the Gains from Repeated Self-Distillation

Figure 2 for Understanding the Gains from Repeated Self-Distillation

Figure 3 for Understanding the Gains from Repeated Self-Distillation

Figure 4 for Understanding the Gains from Repeated Self-Distillation

Abstract:Self-Distillation is a special type of knowledge distillation where the student model has the same architecture as the teacher model. Despite using the same architecture and the same training data, self-distillation has been empirically observed to improve performance, especially when applied repeatedly. For such a process, there is a fundamental question of interest: How much gain is possible by applying multiple steps of self-distillation? To investigate this relative gain, we propose studying the simple but canonical task of linear regression. Our analysis shows that the excess risk achieved by multi-step self-distillation can significantly improve upon a single step of self-distillation, reducing the excess risk by a factor as large as $d$, where $d$ is the input dimension. Empirical results on regression tasks from the UCI repository show a reduction in the learnt model's risk (MSE) by up to 47%.

* 31 pages, 10 figures

Via

Access Paper or Ask Questions

The Effects of Invertibility on the Representational Complexity of Encoders in Variational Autoencoders

Jul 09, 2021

Divyansh Pareek, Andrej Risteski

Abstract:Training and using modern neural-network based latent-variable generative models (like Variational Autoencoders) often require simultaneously training a generative direction along with an inferential(encoding) direction, which approximates the posterior distribution over the latent variables. Thus, the question arises: how complex does the inferential model need to be, in order to be able to accurately model the posterior distribution of a given generative model? In this paper, we identify an important property of the generative map impacting the required size of the encoder. We show that if the generative map is "strongly invertible" (in a sense we suitably formalize), the inferential model need not be much more complex. Conversely, we prove that there exist non-invertible generative maps, for which the encoding direction needs to be exponentially larger (under standard assumptions in computational complexity). Importantly, we do not require the generative model to be layerwise invertible, which a lot of the related literature assumes and isn't satisfied by many architectures used in practice (e.g. convolution and pooling based networks). Thus, we provide theoretical support for the empirical wisdom that learning deep generative models is harder when data lies on a low-dimensional manifold.

* 34 pages

Via

Access Paper or Ask Questions

Finding Input Characterizations for Output Properties in ReLU Neural Networks

Mar 09, 2020

Saket Dingliwal, Divyansh Pareek, Jatin Arora

Figure 1 for Finding Input Characterizations for Output Properties in ReLU Neural Networks

Figure 2 for Finding Input Characterizations for Output Properties in ReLU Neural Networks

Figure 3 for Finding Input Characterizations for Output Properties in ReLU Neural Networks

Abstract:Deep Neural Networks (DNNs) have emerged as a powerful mechanism and are being increasingly deployed in real-world safety-critical domains. Despite the widespread success, their complex architecture makes proving any formal guarantees about them difficult. Identifying how logical notions of high-level correctness relate to the complex low-level network architecture is a significant challenge. In this project, we extend the ideas presented in and introduce a way to bridge the gap between the architecture and the high-level specifications. Our key insight is that instead of directly proving the safety properties that are required, we first prove properties that relate closely to the structure of the neural net and use them to reason about the safety properties. We build theoretical foundations for our approach, and empirically evaluate the performance through various experiments, achieving promising results than the existing approach by identifying a larger region of input space that guarantees a certain property on the output.

* 5 page

Via

Access Paper or Ask Questions