Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anuj Singh

CoDe: Blockwise Control for Denoising Diffusion Models

Feb 03, 2025

Anuj Singh, Sayak Mukherjee, Ahmad Beirami, Hadi Jamali-Rad

Abstract:Aligning diffusion models to downstream tasks often requires finetuning new models or gradient-based guidance at inference time to enable sampling from the reward-tilted posterior. In this work, we explore a simple inference-time gradient-free guidance approach, called controlled denoising (CoDe), that circumvents the need for differentiable guidance functions and model finetuning. CoDe is a blockwise sampling method applied during intermediate denoising steps, allowing for alignment with downstream rewards. Our experiments demonstrate that, despite its simplicity, CoDe offers a favorable trade-off between reward alignment, prompt instruction following, and inference cost, achieving a competitive performance against the state-of-the-art baselines. Our code is available at: https://github.com/anujinho/code.

Via

Access Paper or Ask Questions

MAGMA: Manifold Regularization for MAEs

Dec 03, 2024

Alin Dondera, Anuj Singh, Hadi Jamali-Rad

Abstract:Masked Autoencoders (MAEs) are an important divide in self-supervised learning (SSL) due to their independence from augmentation techniques for generating positive (and/or negative) pairs as in contrastive frameworks. Their masking and reconstruction strategy also nicely aligns with SSL approaches in natural language processing. Most MAEs are built upon Transformer-based architectures where visual features are not regularized as opposed to their convolutional neural network (CNN) based counterparts, which can potentially hinder their performance. To address this, we introduce MAGMA, a novel batch-wide layer-wise regularization loss applied to representations of different Transformer layers. We demonstrate that by plugging in the proposed regularization loss, one can significantly improve the performance of MAE-based models. We further demonstrate the impact of the proposed loss on optimizing other generic SSL approaches (such as VICReg and SimCLR), broadening the impact of the proposed approach. Our code base can be found at https://github.com/adondera/magma.

* To be published in WACV 2025

Via

Access Paper or Ask Questions

GeNIe: Generative Hard Negative Images Through Diffusion

Dec 05, 2023

Soroush Abbasi Koohpayegani, Anuj Singh, K L Navaneet, Hadi Jamali-Rad, Hamed Pirsiavash

Abstract:Data augmentation is crucial in training deep models, preventing them from overfitting to limited data. Common data augmentation methods are effective, but recent advancements in generative AI, such as diffusion models for image generation, enable more sophisticated augmentation techniques that produce data resembling natural images. We recognize that augmented samples closer to the ideal decision boundary of a classifier are particularly effective and efficient in guiding the learning process. We introduce GeNIe which leverages a diffusion model conditioned on a text prompt to merge contrasting data points (an image from the source category and a text prompt from the target category) to generate challenging samples for the target category. Inspired by recent image editing methods, we limit the number of diffusion iterations and the amount of noise. This ensures that the generated image retains low-level and contextual features from the source image, potentially conflicting with the target category. Our extensive experiments, in few-shot and also long-tail distribution settings, demonstrate the effectiveness of our novel augmentation method, especially benefiting categories with a limited number of examples.

* Our code is available https://github.com/UCDvision/GeNIe

Via

Access Paper or Ask Questions

Self-Attention Message Passing for Contrastive Few-Shot Learning

Oct 12, 2022

Ojas Kishorkumar Shirekar, Anuj Singh, Hadi Jamali-Rad

Figure 1 for Self-Attention Message Passing for Contrastive Few-Shot Learning

Figure 2 for Self-Attention Message Passing for Contrastive Few-Shot Learning

Figure 3 for Self-Attention Message Passing for Contrastive Few-Shot Learning

Figure 4 for Self-Attention Message Passing for Contrastive Few-Shot Learning

Abstract:Humans have a unique ability to learn new representations from just a handful of examples with little to no supervision. Deep learning models, however, require an abundance of data and supervision to perform at a satisfactory level. Unsupervised few-shot learning (U-FSL) is the pursuit of bridging this gap between machines and humans. Inspired by the capacity of graph neural networks (GNNs) in discovering complex inter-sample relationships, we propose a novel self-attention based message passing contrastive learning approach (coined as SAMP-CLR) for U-FSL pre-training. We also propose an optimal transport (OT) based fine-tuning strategy (we call OpT-Tune) to efficiently induce task awareness into our novel end-to-end unsupervised few-shot classification framework (SAMPTransfer). Our extensive experimental results corroborate the efficacy of SAMPTransfer in a variety of downstream few-shot classification scenarios, setting a new state-of-the-art for U-FSL on both miniImagenet and tieredImagenet benchmarks, offering up to 7%+ and 5%+ improvements, respectively. Our further investigations also confirm that SAMPTransfer remains on-par with some supervised baselines on miniImagenet and outperforms all existing U-FSL baselines in a challenging cross-domain scenario. Our code can be found in our GitHub repository at https://github.com/ojss/SAMPTransfer/.

Via

Access Paper or Ask Questions

Transductive Decoupled Variational Inference for Few-Shot Classification

Aug 22, 2022

Anuj Singh, Hadi Jamali-Rad

Figure 1 for Transductive Decoupled Variational Inference for Few-Shot Classification

Figure 2 for Transductive Decoupled Variational Inference for Few-Shot Classification

Figure 3 for Transductive Decoupled Variational Inference for Few-Shot Classification

Figure 4 for Transductive Decoupled Variational Inference for Few-Shot Classification

Abstract:The versatility to learn from a handful of samples is the hallmark of human intelligence. Few-shot learning is an endeavour to transcend this capability down to machines. Inspired by the promise and power of probabilistic deep learning, we propose a novel variational inference network for few-shot classification (coined as TRIDENT) to decouple the representation of an image into semantic and label latent variables, and simultaneously infer them in an intertwined fashion. To induce task-awareness, as part of the inference mechanics of TRIDENT, we exploit information across both query and support images of a few-shot task using a novel built-in attention-based transductive feature extraction module (we call AttFEX). Our extensive experimental results corroborate the efficacy of TRIDENT and demonstrate that, using the simplest of backbones, it sets a new state-of-the-art in the most commonly adopted datasets miniImageNet and tieredImageNet (offering up to 4% and 5% improvements, respectively), as well as for the recent challenging cross-domain miniImagenet --> CUB scenario offering a significant margin (up to 20% improvement) beyond the best existing cross-domain baselines. Code and experimentation can be found in our GitHub repository: https://github.com/anujinho/trident

Via

Access Paper or Ask Questions