Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sudhanshu Mittal

What Matters for In-Context Learning: A Balancing Act of Look-up and In-Weight Learning

Jan 09, 2025

Jelena Bratulić, Sudhanshu Mittal, Christian Rupprecht, Thomas Brox

Abstract:Large Language Models (LLMs) have demonstrated impressive performance in various tasks, including In-Context Learning (ICL), where the model performs new tasks by conditioning solely on the examples provided in the context, without updating the model's weights. While prior research has explored the roles of pretraining data and model architecture, the key mechanism behind ICL remains unclear. In this work, we systematically uncover properties present in LLMs that support the emergence of ICL. To disambiguate these factors, we conduct a study with a controlled dataset and data sequences using a deep autoregressive model. We show that conceptual repetitions in the data sequences are crucial for ICL, more so than previously indicated training data properties like burstiness or long-tail distribution. Conceptual repetitions could refer to $n$-gram repetitions in textual data or exact image copies in image sequence data. Such repetitions also offer other previously overlooked benefits such as reduced transiency in ICL performance. Furthermore, we show that the emergence of ICL depends on balancing the in-weight learning objective with the in-context solving ability during training.

Via

Access Paper or Ask Questions

Revisiting Deep Active Learning for Semantic Segmentation

Feb 08, 2023

Sudhanshu Mittal, Joshua Niemeijer, Jörg P. Schäfer, Thomas Brox

Abstract:Active learning automatically selects samples for annotation from a data pool to achieve maximum performance with minimum annotation cost. This is particularly critical for semantic segmentation, where annotations are costly. In this work, we show in the context of semantic segmentation that the data distribution is decisive for the performance of the various active learning objectives proposed in the literature. Particularly, redundancy in the data, as it appears in most driving scenarios and video datasets, plays a large role. We demonstrate that the integration of semi-supervised learning with active learning can improve performance when the two objectives are aligned. Our experimental study shows that current active learning benchmarks for segmentation in driving scenarios are not realistic since they operate on data that is already curated for maximum diversity. Accordingly, we propose a more realistic evaluation scheme in which the value of active learning becomes clearly visible, both by itself and in combination with semi-supervised learning.

Via

Access Paper or Ask Questions

Open-vocabulary Attribute Detection

Nov 23, 2022

María A. Bravo, Sudhanshu Mittal, Simon Ging, Thomas Brox

Abstract:Vision-language modeling has enabled open-vocabulary tasks where predictions can be queried using any text prompt in a zero-shot manner. Existing open-vocabulary tasks focus on object classes, whereas research on object attributes is limited due to the lack of a reliable attribute-focused evaluation benchmark. This paper introduces the Open-Vocabulary Attribute Detection (OVAD) task and the corresponding OVAD benchmark. The objective of the novel task and benchmark is to probe object-level attribute information learned by vision-language models. To this end, we created a clean and densely annotated test set covering 117 attribute classes on the 80 object classes of MS COCO. It includes positive and negative annotations, which enables open-vocabulary evaluation. Overall, the benchmark consists of 1.4 million annotations. For reference, we provide a first baseline method for open-vocabulary attribute detection. Moreover, we demonstrate the benchmark's value by studying the attribute detection performance of several foundation models. Project page https://ovad-benchmark.github.io/

Via

Access Paper or Ask Questions

Localized Vision-Language Matching for Open-vocabulary Object Detection

May 12, 2022

Maria A. Bravo, Sudhanshu Mittal, Thomas Brox

Figure 1 for Localized Vision-Language Matching for Open-vocabulary Object Detection

Figure 2 for Localized Vision-Language Matching for Open-vocabulary Object Detection

Figure 3 for Localized Vision-Language Matching for Open-vocabulary Object Detection

Figure 4 for Localized Vision-Language Matching for Open-vocabulary Object Detection

Abstract:In this work, we propose an open-world object detection method that, based on image-caption pairs, learns to detect novel object classes along with a given set of known classes. It is a two-stage training approach that first uses a location-guided image-caption matching technique to learn class labels for both novel and known classes in a weakly-supervised manner and second specializes the model for the object detection task using known class annotations. We show that a simple language model fits better than a large contextualized language model for detecting novel objects. Moreover, we introduce a consistency-regularization technique to better exploit image-caption pair information. Our method compares favorably to existing open-world detection approaches while being data-efficient.

Via

Access Paper or Ask Questions

Essentials for Class Incremental Learning

Feb 18, 2021

Sudhanshu Mittal, Silvio Galesso, Thomas Brox

Figure 1 for Essentials for Class Incremental Learning

Figure 2 for Essentials for Class Incremental Learning

Figure 3 for Essentials for Class Incremental Learning

Figure 4 for Essentials for Class Incremental Learning

Abstract:Contemporary neural networks are limited in their ability to learn from evolving streams of training data. When trained sequentially on new or evolving tasks, their accuracy drops sharply, making them unsuitable for many real-world applications. In this work, we shed light on the causes of this well-known yet unsolved phenomenon - often referred to as catastrophic forgetting - in a class-incremental setup. We show that a combination of simple components and a loss that balances intra-task and inter-task learning can already resolve forgetting to the same extent as more complex measures proposed in literature. Moreover, we identify poor quality of the learned representation as another reason for catastrophic forgetting in class-IL. We show that performance is correlated with secondary class information (dark knowledge) learned by the model and it can be improved by an appropriate regularizer. With these lessons learned, class-incremental learning results on CIFAR-100 and ImageNet improve over the state-of-the-art by a large margin, while keeping the approach simple.

Via

Access Paper or Ask Questions

Parting with Illusions about Deep Active Learning

Dec 11, 2019

Sudhanshu Mittal, Maxim Tatarchenko, Özgün Çiçek, Thomas Brox

Figure 1 for Parting with Illusions about Deep Active Learning

Figure 2 for Parting with Illusions about Deep Active Learning

Figure 3 for Parting with Illusions about Deep Active Learning

Figure 4 for Parting with Illusions about Deep Active Learning

Abstract:Active learning aims to reduce the high labeling cost involved in training machine learning models on large datasets by efficiently labeling only the most informative samples. Recently, deep active learning has shown success on various tasks. However, the conventional evaluation scheme used for deep active learning is below par. Current methods disregard some apparent parallel work in the closely related fields. Active learning methods are quite sensitive w.r.t. changes in the training procedure like data augmentation. They improve by a large-margin when integrated with semi-supervised learning, but barely perform better than the random baseline. We re-implement various latest active learning approaches for image classification and evaluate them under more realistic settings. We further validate our findings for semantic segmentation. Based on our observations, we realistically assess the current state of the field and propose a more suitable evaluation protocol.

Via

Access Paper or Ask Questions

Semi-Supervised Semantic Segmentation with High- and Low-level Consistency

Aug 15, 2019

Sudhanshu Mittal, Maxim Tatarchenko, Thomas Brox

Figure 1 for Semi-Supervised Semantic Segmentation with High- and Low-level Consistency

Figure 2 for Semi-Supervised Semantic Segmentation with High- and Low-level Consistency

Figure 3 for Semi-Supervised Semantic Segmentation with High- and Low-level Consistency

Figure 4 for Semi-Supervised Semantic Segmentation with High- and Low-level Consistency

Abstract:The ability to understand visual information from limited labeled data is an important aspect of machine learning. While image-level classification has been extensively studied in a semi-supervised setting, dense pixel-level classification with limited data has only drawn attention recently. In this work, we propose an approach for semi-supervised semantic segmentation that learns from limited pixel-wise annotated samples while exploiting additional annotation-free images. It uses two network branches that link semi-supervised classification with semi-supervised segmentation including self-training. The dual-branch approach reduces both the low-level and the high-level artifacts typical when training with few labels. The approach attains significant improvement over existing methods, especially when trained with very few labeled samples. On several standard benchmarks - PASCAL VOC 2012, PASCAL-Context, and Cityscapes - the approach achieves new state-of-the-art in semi-supervised learning.

Via

Access Paper or Ask Questions