Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shanmuganathan Raman

Department of Electrical Engineering, Indian Institute of Technology Gandhinagar, India

RASP: Revisiting 3D Anamorphic Art for Shadow-Guided Packing of Irregular Objects

Apr 03, 2025

Soumyaratna Debnath, Ashish Tiwari, Kaustubh Sadekar, Shanmuganathan Raman

Abstract:Recent advancements in learning-based methods have opened new avenues for exploring and interpreting art forms, such as shadow art, origami, and sketch art, through computational models. One notable visual art form is 3D Anamorphic Art in which an ensemble of arbitrarily shaped 3D objects creates a realistic and meaningful expression when observed from a particular viewpoint and loses its coherence over the other viewpoints. In this work, we build on insights from 3D Anamorphic Art to perform 3D object arrangement. We introduce RASP, a differentiable-rendering-based framework to arrange arbitrarily shaped 3D objects within a bounded volume via shadow (or silhouette)-guided optimization with an aim of minimal inter-object spacing and near-maximal occupancy. Furthermore, we propose a novel SDF-based formulation to handle inter-object intersection and container extrusion. We demonstrate that RASP can be extended to part assembly alongside object packing considering 3D objects to be "parts" of another 3D object. Finally, we present artistic illustrations of multi-view anamorphic art, achieving meaningful expressions from multiple viewpoints within a single ensemble.

* Conference on Computer Vision and Pattern Recognition (CVPR) 2025

Via

Access Paper or Ask Questions

STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans

Mar 17, 2025

Shashikant Verma, Harish Katti, Soumyaratna Debnath, Yamuna Swamy, Shanmuganathan Raman

Abstract:We introduce STEP, a novel framework utilizing Transformer-based discriminative model prediction for simultaneous tracking and estimation of pose across diverse animal species and humans. We are inspired by the fact that the human brain exploits spatiotemporal continuity and performs concurrent localization and pose estimation despite the specialization of brain areas for form and motion processing. Traditional discriminative models typically require predefined target states for determining model weights, a challenge we address through Gaussian Map Soft Prediction (GMSP) and Offset Map Regression Adapter (OMRA) Modules. These modules remove the necessity of keypoint target states as input, streamlining the process. Our method starts with a known target state initialized through a pre-trained detector or manual initialization in the initial frame of a given video sequence. It then seamlessly tracks the target and estimates keypoints of anatomical importance as output for subsequent frames. Unlike prevalent top-down pose estimation methods, our approach doesn't rely on per-frame target detections due to its tracking capability. This facilitates a significant advancement in inference efficiency and potential applications. We train and validate our approach on datasets encompassing diverse species. Our experiments demonstrate superior results compared to existing methods, opening doors to various applications, including but not limited to action recognition and behavioral analysis.

Via

Access Paper or Ask Questions

L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild

Jan 02, 2025

Soumyaratna Debnath, Harish Katti, Shashikant Verma, Shanmuganathan Raman

Abstract:While 2D pose estimation has advanced our ability to interpret body movements in animals and primates, it is limited by the lack of depth information, constraining its application range. 3D pose estimation provides a more comprehensive solution by incorporating spatial depth, yet creating extensive 3D pose datasets for animals is challenging due to their dynamic and unpredictable behaviours in natural settings. To address this, we propose a hybrid approach that utilizes rigged avatars and the pipeline to generate synthetic datasets to acquire the necessary 3D annotations for training. Our method introduces a simple attention-based MLP network for converting 2D poses to 3D, designed to be independent of the input image to ensure scalability for poses in natural environments. Additionally, we identify that existing anatomical keypoint detectors are insufficient for accurate pose retargeting onto arbitrary avatars. To overcome this, we present a lookup table based on a deep pose estimation method using a synthetic collection of diverse actions rigged avatars perform. Our experiments demonstrate the effectiveness and efficiency of this lookup table-based retargeting approach. Overall, we propose a comprehensive framework with systematically synthesized datasets for lifting poses from 2D to 3D and then utilize this to re-target motion from wild settings onto arbitrary avatars.

* 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025)

Via

Access Paper or Ask Questions

BloomCoreset: Fast Coreset Sampling using Bloom Filters for Fine-Grained Self-Supervised Learning

Dec 22, 2024

Prajwal Singh, Gautam Vashishtha, Indra Deep Mastan, Shanmuganathan Raman

Figure 1 for BloomCoreset: Fast Coreset Sampling using Bloom Filters for Fine-Grained Self-Supervised Learning

Figure 2 for BloomCoreset: Fast Coreset Sampling using Bloom Filters for Fine-Grained Self-Supervised Learning

Figure 3 for BloomCoreset: Fast Coreset Sampling using Bloom Filters for Fine-Grained Self-Supervised Learning

Figure 4 for BloomCoreset: Fast Coreset Sampling using Bloom Filters for Fine-Grained Self-Supervised Learning

Abstract:The success of deep learning in supervised fine-grained recognition for domain-specific tasks relies heavily on expert annotations. The Open-Set for fine-grained Self-Supervised Learning (SSL) problem aims to enhance performance on downstream tasks by strategically sampling a subset of images (the Core-Set) from a large pool of unlabeled data (the Open-Set). In this paper, we propose a novel method, BloomCoreset, that significantly reduces sampling time from Open-Set while preserving the quality of samples in the coreset. To achieve this, we utilize Bloom filters as an innovative hashing mechanism to store both low- and high-level features of the fine-grained dataset, as captured by Open-CLIP, in a space-efficient manner that enables rapid retrieval of the coreset from the Open-Set. To show the effectiveness of the sampled coreset, we integrate the proposed method into the state-of-the-art fine-grained SSL framework, SimCore [1]. The proposed algorithm drastically outperforms the sampling strategy of the baseline in SimCore [1] with a $98.5\%$ reduction in sampling time with a mere $0.83\%$ average trade-off in accuracy calculated across $11$ downstream datasets.

* Accepted at ICASSP 2025

Via

Access Paper or Ask Questions

$C^{3}$-NeRF: Modeling Multiple Scenes via Conditional-cum-Continual Neural Radiance Fields

Nov 29, 2024

Prajwal Singh, Ashish Tiwari, Gautam Vashishtha, Shanmuganathan Raman

Abstract:Neural radiance fields (NeRF) have exhibited highly photorealistic rendering of novel views through per-scene optimization over a single 3D scene. With the growing popularity of NeRF and its variants, they have become ubiquitous and have been identified as efficient 3D resources. However, they are still far from being scalable since a separate model needs to be stored for each scene, and the training time increases linearly with every newly added scene. Surprisingly, the idea of encoding multiple 3D scenes into a single NeRF model is heavily under-explored. In this work, we propose a novel conditional-cum-continual framework, called $C^{3}$-NeRF, to accommodate multiple scenes into the parameters of a single neural radiance field. Unlike conventional approaches that leverage feature extractors and pre-trained priors for scene conditioning, we use simple pseudo-scene labels to model multiple scenes in NeRF. Interestingly, we observe the framework is also inherently continual (via generative replay) with minimal, if not no, forgetting of the previously learned scenes. Consequently, the proposed framework adapts to multiple new scenes without necessarily accessing the old data. Through extensive qualitative and quantitative evaluation using synthetic and real datasets, we demonstrate the inherent capacity of the NeRF model to accommodate multiple scenes with high-quality novel-view renderings without adding additional parameters. We provide implementation details and dynamic visualizations of our results in the supplementary file.

Via

Access Paper or Ask Questions

LIPIDS: Learning-based Illumination Planning In Discretized (Light) Space for Photometric Stereo

Sep 01, 2024

Ashish Tiwari, Mihir Sutariya, Shanmuganathan Raman

Figure 1 for LIPIDS: Learning-based Illumination Planning In Discretized (Light) Space for Photometric Stereo

Figure 2 for LIPIDS: Learning-based Illumination Planning In Discretized (Light) Space for Photometric Stereo

Figure 3 for LIPIDS: Learning-based Illumination Planning In Discretized (Light) Space for Photometric Stereo

Figure 4 for LIPIDS: Learning-based Illumination Planning In Discretized (Light) Space for Photometric Stereo

Abstract:Photometric stereo is a powerful method for obtaining per-pixel surface normals from differently illuminated images of an object. While several methods address photometric stereo with different image (or light) counts ranging from one to two to a hundred, very few focus on learning optimal lighting configuration. Finding an optimal configuration is challenging due to the vast number of possible lighting directions. Moreover, exhaustively sampling all possibilities is impractical due to time and resource constraints. Photometric stereo methods have demonstrated promising performance on existing datasets, which feature limited light directions sparsely sampled from the light space. Therefore, can we optimally utilize these datasets for illumination planning? In this work, we introduce LIPIDS - Learning-based Illumination Planning In Discretized light Space to achieve minimal and optimal lighting configurations for photometric stereo under arbitrary light distribution. We propose a Light Sampling Network (LSNet) that optimizes lighting direction for a fixed number of lights by minimizing the normal loss through a normal regression network. The learned light configurations can directly estimate surface normals during inference, even using an off-the-shelf photometric stereo method. Extensive qualitative and quantitative analyses on synthetic and real-world datasets show that photometric stereo under learned lighting configurations through LIPIDS either surpasses or is nearly comparable to existing illumination planning methods across different photometric stereo backbones.

* Accepted in WACV 2025

Via

Access Paper or Ask Questions

MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo

Sep 01, 2024

Ashish Tiwari, Satoshi Ikehata, Shanmuganathan Raman

Figure 1 for MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo

Figure 2 for MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo

Figure 3 for MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo

Figure 4 for MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo

Abstract:Photometric stereo typically demands intricate data acquisition setups involving multiple light sources to recover surface normals accurately. In this paper, we propose MERLiN, an attention-based hourglass network that integrates single image-based inverse rendering and relighting within a single unified framework. We evaluate the performance of photometric stereo methods using these relit images and demonstrate how they can circumvent the underlying challenge of complex data acquisition. Our physically-based model is trained on a large synthetic dataset containing complex shapes with spatially varying BRDF and is designed to handle indirect illumination effects to improve material reconstruction and relighting. Through extensive qualitative and quantitative evaluation, we demonstrate that the proposed framework generalizes well to real-world images, achieving high-quality shape, material estimation, and relighting. We assess these synthetically relit images over photometric stereo benchmark methods for their physical correctness and resulting normal estimation accuracy, paving the way towards single-shot photometric stereo through physically-based relighting. This work allows us to address the single image-based inverse rendering problem holistically, applying well to both synthetic and real data and taking a step towards mitigating the challenge of data acquisition in photometric stereo.

* Accepted in ECCV 2024

Via

Access Paper or Ask Questions

SS-SfP:Neural Inverse Rendering for Self Supervised Shape from (Mixed) Polarization

Jul 12, 2024

Ashish Tiwari, Shanmuganathan Raman

Abstract:We present a novel inverse rendering-based framework to estimate the 3D shape (per-pixel surface normals and depth) of objects and scenes from single-view polarization images, the problem popularly known as Shape from Polarization (SfP). The existing physics-based and learning-based methods for SfP perform under certain restrictions, i.e., (a) purely diffuse or purely specular reflections, which are seldom in the real surfaces, (b) availability of the ground truth surface normals for direct supervision that are hard to acquire and are limited by the scanner's resolution, and (c) known refractive index. To overcome these restrictions, we start by learning to separate the partially-polarized diffuse and specular reflection components, which we call reflectance cues, based on a modified polarization reflection model and then estimate shape under mixed polarization through an inverse-rendering based self-supervised deep learning framework called SS-SfP, guided by the polarization data and estimated reflectance cues. Furthermore, we also obtain the refractive index as a non-linear least squares solution. Through extensive quantitative and qualitative evaluation, we establish the efficacy of the proposed framework over simple single-object scenes from DeepSfP dataset and complex in-the-wild scenes from SPW dataset in an entirely self-supervised setting. To the best of our knowledge, this is the first learning-based approach to address SfP under mixed polarization in a completely self-supervised framework.

* Published in Pacific Graphics 2023

Via

Access Paper or Ask Questions

Learning Robust Deep Visual Representations from EEG Brain Recordings

Oct 25, 2023

Prajwal Singh, Dwip Dalal, Gautam Vashishtha, Krishna Miyapuram, Shanmuganathan Raman

Abstract:Decoding the human brain has been a hallmark of neuroscientists and Artificial Intelligence researchers alike. Reconstruction of visual images from brain Electroencephalography (EEG) signals has garnered a lot of interest due to its applications in brain-computer interfacing. This study proposes a two-stage method where the first step is to obtain EEG-derived features for robust learning of deep representations and subsequently utilize the learned representation for image generation and classification. We demonstrate the generalizability of our feature extraction pipeline across three different datasets using deep-learning architectures with supervised and contrastive learning methods. We have performed the zero-shot EEG classification task to support the generalizability claim further. We observed that a subject invariant linearly separable visual representation was learned using EEG data alone in an unimodal setting that gives better k-means accuracy as compared to a joint representation learning between EEG and images. Finally, we propose a novel framework to transform unseen images into the EEG space and reconstruct them with approximation, showcasing the potential for image reconstruction from EEG signals. Our proposed image synthesis method from EEG shows 62.9% and 36.13% inception score improvement on the EEGCVPR40 and the Thoughtviz datasets, which is better than state-of-the-art performance in GAN.

* Accepted in WACV 2024

Via

Access Paper or Ask Questions

Single Image LDR to HDR Conversion using Conditional Diffusion

Jul 06, 2023

Dwip Dalal, Gautam Vashishtha, Prajwal Singh, Shanmuganathan Raman

Abstract:Digital imaging aims to replicate realistic scenes, but Low Dynamic Range (LDR) cameras cannot represent the wide dynamic range of real scenes, resulting in under-/overexposed images. This paper presents a deep learning-based approach for recovering intricate details from shadows and highlights while reconstructing High Dynamic Range (HDR) images. We formulate the problem as an image-to-image (I2I) translation task and propose a conditional Denoising Diffusion Probabilistic Model (DDPM) based framework using classifier-free guidance. We incorporate a deep CNN-based autoencoder in our proposed framework to enhance the quality of the latent representation of the input LDR image used for conditioning. Moreover, we introduce a new loss function for LDR-HDR translation tasks, termed Exposure Loss. This loss helps direct gradients in the opposite direction of the saturation, further improving the results' quality. By conducting comprehensive quantitative and qualitative experiments, we have effectively demonstrated the proficiency of our proposed method. The results indicate that a simple conditional diffusion-based method can replace the complex camera pipeline-based architectures.

* IEEE International Conference on Image Processing 2023

Via

Access Paper or Ask Questions