Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jacob Roll

On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods

Oct 17, 2024

Hariprasath Govindarajan, Per Sidén, Jacob Roll, Fredrik Lindsten

Figure 1 for On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods

Figure 2 for On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods

Figure 3 for On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods

Figure 4 for On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods

Abstract:A prominent self-supervised learning paradigm is to model the representations as clusters, or more generally as a mixture model. Learning to map the data samples to compact representations and fitting the mixture model simultaneously leads to the representation collapse problem. Regularizing the distribution of data points over the clusters is the prevalent strategy to avoid this issue. While this is sufficient to prevent full representation collapse, we show that a partial prototype collapse problem still exists in the DINO family of methods, that leads to significant redundancies in the prototypes. Such prototype redundancies serve as shortcuts for the method to achieve a marginal latent class distribution that matches the prescribed prior. We show that by encouraging the model to use diverse prototypes, the partial prototype collapse can be mitigated. Effective utilization of the prototypes enables the methods to learn more fine-grained clusters, encouraging more informative representations. We demonstrate that this is especially beneficial when pre-training on a long-tailed fine-grained dataset.

* First version of the paper appeared in OpenReview on 22 Sep 2023. Accepted to BMVC 2024

Via

Access Paper or Ask Questions

DINO as a von Mises-Fisher mixture model

May 17, 2024

Hariprasath Govindarajan, Per Sidén, Jacob Roll, Fredrik Lindsten

Figure 1 for DINO as a von Mises-Fisher mixture model

Figure 2 for DINO as a von Mises-Fisher mixture model

Figure 3 for DINO as a von Mises-Fisher mixture model

Figure 4 for DINO as a von Mises-Fisher mixture model

Abstract:Self-distillation methods using Siamese networks are popular for self-supervised pre-training. DINO is one such method based on a cross-entropy loss between $K$-dimensional probability vectors, obtained by applying a softmax function to the dot product between representations and learnt prototypes. Given the fact that the learned representations are $L^2$-normalized, we show that DINO and its derivatives, such as iBOT, can be interpreted as a mixture model of von Mises-Fisher components. With this interpretation, DINO assumes equal precision for all components when the prototypes are also $L^2$-normalized. Using this insight we propose DINO-vMF, that adds appropriate normalization constants when computing the cluster assignment probabilities. Unlike DINO, DINO-vMF is stable also for the larger ViT-Base model with unnormalized prototypes. We show that the added flexibility of the mixture model is beneficial in terms of better image representations. The DINO-vMF pre-trained model consistently performs better than DINO on a range of downstream tasks. We obtain similar improvements for iBOT-vMF vs iBOT and thereby show the relevance of our proposed modification also for other methods derived from DINO.

* Accepted to ICLR 2023

Via

Access Paper or Ask Questions

Evaluating model calibration in classification

Feb 19, 2019

Juozas Vaicenavicius, David Widmann, Carl Andersson, Fredrik Lindsten, Jacob Roll, Thomas B. Schön

Figure 1 for Evaluating model calibration in classification

Figure 2 for Evaluating model calibration in classification

Figure 3 for Evaluating model calibration in classification

Figure 4 for Evaluating model calibration in classification

Abstract:Probabilistic classifiers output a probability distribution on target classes rather than just a class prediction. Besides providing a clear separation of prediction and decision making, the main advantage of probabilistic models is their ability to represent uncertainty about predictions. In safety-critical applications, it is pivotal for a model to possess an adequate sense of uncertainty, which for probabilistic classifiers translates into outputting probability distributions that are consistent with the empirical frequencies observed from realized outcomes. A classifier with such a property is called calibrated. In this work, we develop a general theoretical calibration evaluation framework grounded in probability theory, and point out subtleties present in model calibration evaluation that lead to refined interpretations of existing evaluation techniques. Lastly, we propose new ways to quantify and visualize miscalibration in probabilistic classification, including novel multidimensional reliability diagrams.

Via

Access Paper or Ask Questions