Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Deep Chakraborty

A Survey on Data Curation for Visual Contrastive Learning: Why Crafting Effective Positive and Negative Pairs Matters

Feb 12, 2025

Shasvat Desai, Debasmita Ghose, Deep Chakraborty

Abstract:Visual contrastive learning aims to learn representations by contrasting similar (positive) and dissimilar (negative) pairs of data samples. The design of these pairs significantly impacts representation quality, training efficiency, and computational cost. A well-curated set of pairs leads to stronger representations and faster convergence. As contrastive pre-training sees wider adoption for solving downstream tasks, data curation becomes essential for optimizing its effectiveness. In this survey, we attempt to create a taxonomy of existing techniques for positive and negative pair curation in contrastive learning, and describe them in detail.

* 9 pages, 3 figures

Via

Access Paper or Ask Questions

Improving Pre-Trained Self-Supervised Embeddings Through Effective Entropy Maximization

Nov 24, 2024

Deep Chakraborty, Yann LeCun, Tim G. J. Rudner, Erik Learned-Miller

Figure 1 for Improving Pre-Trained Self-Supervised Embeddings Through Effective Entropy Maximization

Figure 2 for Improving Pre-Trained Self-Supervised Embeddings Through Effective Entropy Maximization

Figure 3 for Improving Pre-Trained Self-Supervised Embeddings Through Effective Entropy Maximization

Figure 4 for Improving Pre-Trained Self-Supervised Embeddings Through Effective Entropy Maximization

Abstract:A number of different architectures and loss functions have been applied to the problem of self-supervised learning (SSL), with the goal of developing embeddings that provide the best possible pre-training for as-yet-unknown, lightly supervised downstream tasks. One of these SSL criteria is to maximize the entropy of a set of embeddings in some compact space. But the goal of maximizing the embedding entropy often depends--whether explicitly or implicitly--upon high dimensional entropy estimates, which typically perform poorly in more than a few dimensions. In this paper, we motivate an effective entropy maximization criterion (E2MC), defined in terms of easy-to-estimate, low-dimensional constraints. We demonstrate that using it to continue training an already-trained SSL model for only a handful of epochs leads to a consistent and, in some cases, significant improvement in downstream performance. We perform careful ablation studies to show that the improved performance is due to the proposed add-on criterion. We also show that continued pre-training with alternative criteria does not lead to notable improvements, and in some cases, even degrades performance.

* 19 pages including appendix, 5 figures

Via

Access Paper or Ask Questions

Self-Supervised Learning to Guide Scientifically Relevant Categorization of Martian Terrain Images

Apr 21, 2022

Tejas Panambur, Deep Chakraborty, Melissa Meyer, Ralph Milliken, Erik Learned-Miller, Mario Parente

Figure 1 for Self-Supervised Learning to Guide Scientifically Relevant Categorization of Martian Terrain Images

Figure 2 for Self-Supervised Learning to Guide Scientifically Relevant Categorization of Martian Terrain Images

Figure 3 for Self-Supervised Learning to Guide Scientifically Relevant Categorization of Martian Terrain Images

Figure 4 for Self-Supervised Learning to Guide Scientifically Relevant Categorization of Martian Terrain Images

Abstract:Automatic terrain recognition in Mars rover images is an important problem not just for navigation, but for scientists interested in studying rock types, and by extension, conditions of the ancient Martian paleoclimate and habitability. Existing approaches to label Martian terrain either involve the use of non-expert annotators producing taxonomies of limited granularity (e.g. soil, sand, bedrock, float rock, etc.), or rely on generic class discovery approaches that tend to produce perceptual classes such as rover parts and landscape, which are irrelevant to geologic analysis. Expert-labeled datasets containing granular geological/geomorphological terrain categories are rare or inaccessible to public, and sometimes require the extraction of relevant categorical information from complex annotations. In order to facilitate the creation of a dataset with detailed terrain categories, we present a self-supervised method that can cluster sedimentary textures in images captured from the Mast camera onboard the Curiosity rover (Mars Science Laboratory). We then present a qualitative analysis of these clusters and describe their geologic significance via the creation of a set of granular terrain categories. The precision and geologic validation of these automatically discovered clusters suggest that our methods are promising for the rapid classification of important geologic features and will therefore facilitate our long-term goal of producing a large, granular, and publicly available dataset for Mars terrain recognition.

* Earthvision at CVPR Workshops 2022, Code and datasets are available at https://github.com/TejasPanambur/mastcam

Via

Access Paper or Ask Questions

Pedestrian Detection in Thermal Images using Saliency Maps

Apr 15, 2019

Debasmita Ghose, Shasvat Mukeshkumar Desai, Sneha Bhattacharya, Deep Chakraborty, Madalina Fiterau, Tauhidur Rahman

Figure 1 for Pedestrian Detection in Thermal Images using Saliency Maps

Figure 2 for Pedestrian Detection in Thermal Images using Saliency Maps

Figure 3 for Pedestrian Detection in Thermal Images using Saliency Maps

Figure 4 for Pedestrian Detection in Thermal Images using Saliency Maps

Abstract:Thermal images are mainly used to detect the presence of people at night or in bad lighting conditions, but perform poorly at daytime. To solve this problem, most state-of-the-art techniques employ a fusion network that uses features from paired thermal and color images. Instead, we propose to augment thermal images with their saliency maps, to serve as an attention mechanism for the pedestrian detector especially during daytime. We investigate how such an approach results in improved performance for pedestrian detection using only thermal images, eliminating the need for paired color images. For our experiments, we train the Faster R-CNN for pedestrian detection and report the added effect of saliency maps generated using static and deep methods (PiCA-Net and R3-Net). Our best performing model results in an absolute reduction of miss rate by 13.4% and 19.4% over the baseline in day and night images respectively. We also annotate and release pixel level masks of pedestrians on a subset of the KAIST Multispectral Pedestrian Detection dataset, which is a first publicly available dataset for salient pedestrian detection.

* Accepted at CVPR 2019 Workshop (PBVS), 10 pages, 7 figures

Via

Access Paper or Ask Questions

Nonparallel Emotional Speech Conversion

Nov 03, 2018

Jian Gao, Deep Chakraborty, Hamidou Tembine, Olaitan Olaleye

Figure 1 for Nonparallel Emotional Speech Conversion

Figure 2 for Nonparallel Emotional Speech Conversion

Figure 3 for Nonparallel Emotional Speech Conversion

Figure 4 for Nonparallel Emotional Speech Conversion

Abstract:We propose a nonparallel data-driven emotional speech conversion method. It enables the transfer of emotion-related characteristics of a speech signal while preserving the speaker's identity and linguistic content. Most existing approaches require parallel data and time alignment, which is not available in most real applications. We achieve nonparallel training based on an unsupervised style transfer technique, which learns a translation model between two distributions instead of a deterministic one-to-one mapping between paired examples. The conversion model consists of an encoder and a decoder for each emotion domain. We assume that the speech signal can be decomposed into an emotion-invariant content code and an emotion-related style code in latent space. Emotion conversion is performed by extracting and recombining the content code of the source speech and the style code of the target emotion. We tested our method on a nonparallel corpora with four emotions. Both subjective and objective evaluations show the effectiveness of our approach.

* submitted to ICASSP 2019, 5 pages, 5 figures

Via

Access Paper or Ask Questions

Unsupervised Hard Example Mining from Videos for Improved Object Detection

Aug 13, 2018

SouYoung Jin, Aruni RoyChowdhury, Huaizu Jiang, Ashish Singh, Aditya Prasad, Deep Chakraborty, Erik Learned-Miller

Figure 1 for Unsupervised Hard Example Mining from Videos for Improved Object Detection

Figure 2 for Unsupervised Hard Example Mining from Videos for Improved Object Detection

Figure 3 for Unsupervised Hard Example Mining from Videos for Improved Object Detection

Figure 4 for Unsupervised Hard Example Mining from Videos for Improved Object Detection

Abstract:Important gains have recently been obtained in object detection by using training objectives that focus on {\em hard negative} examples, i.e., negative examples that are currently rated as positive or ambiguous by the detector. These examples can strongly influence parameters when the network is trained to correct them. Unfortunately, they are often sparse in the training data, and are expensive to obtain. In this work, we show how large numbers of hard negatives can be obtained {\em automatically} by analyzing the output of a trained detector on video sequences. In particular, detections that are {\em isolated in time}, i.e., that have no associated preceding or following detections, are likely to be hard negatives. We describe simple procedures for mining large numbers of such hard negatives (and also hard {\em positives}) from unlabeled video data. Our experiments show that retraining detectors on these automatically obtained examples often significantly improves performance. We present experiments on multiple architectures and multiple data sets, including face detection, pedestrian detection and other object categories.

* 14 pages, 7 figures, accepted at ECCV 2018

Via

Access Paper or Ask Questions