Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Maheswar Bora

DenVisCoM: Dense Vision Correspondence Mamba for Efficient and Real-time Optical Flow and Stereo Estimation

Feb 02, 2026

Tushar Anand, Maheswar Bora, Antitza Dantcheva, Abhijit Das

Abstract:In this work, we propose a novel Mamba block DenVisCoM, as well as a novel hybrid architecture specifically tailored for accurate and real-time estimation of optical flow and disparity estimation. Given that such multi-view geometry and motion tasks are fundamentally related, we propose a unified architecture to tackle them jointly. Specifically, the proposed hybrid architecture is based on DenVisCoM and a Transformer-based attention block that efficiently addresses real-time inference, memory footprint, and accuracy at the same time for joint estimation of motion and 3D dense perception tasks. We extensively analyze the benchmark trade-off of accuracy and real-time processing on a large number of datasets. Our experimental results and related analysis suggest that our proposed model can accurately estimate optical flow and disparity estimation in real time. All models and associated code are available at https://github.com/vimstereo/DenVisCoM.

* IEEE International Conference on Robotics and Automation 2026

Via

Access Paper or Ask Questions

ViM-Disparity: Bridging the Gap of Speed, Accuracy and Memory for Disparity Map Generation

Dec 21, 2024

Maheswar Bora, Tushar Anand, Saurabh Atreya, Aritra Mukherjee, Abhijit Das

Figure 1 for ViM-Disparity: Bridging the Gap of Speed, Accuracy and Memory for Disparity Map Generation

Figure 2 for ViM-Disparity: Bridging the Gap of Speed, Accuracy and Memory for Disparity Map Generation

Figure 3 for ViM-Disparity: Bridging the Gap of Speed, Accuracy and Memory for Disparity Map Generation

Figure 4 for ViM-Disparity: Bridging the Gap of Speed, Accuracy and Memory for Disparity Map Generation

Abstract:In this work we propose a Visual Mamba (ViM) based architecture, to dissolve the existing trade-off for real-time and accurate model with low computation overhead for disparity map generation (DMG). Moreover, we proposed a performance measure that can jointly evaluate the inference speed, computation overhead and the accurateness of a DMG model.

Via

Access Paper or Ask Questions

KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder

Nov 19, 2024

Maheswar Bora, Saurabh Atreya, Aritra Mukherjee, Abhijit Das

Figure 1 for KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder

Figure 2 for KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder

Figure 3 for KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder

Figure 4 for KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder

Abstract:In this work, we attempted to extend the thought and showcase a way forward for the Self-supervised Learning (SSL) learning paradigm by combining contrastive learning, self-distillation (knowledge distillation) and masked data modelling, the three major SSL frameworks, to learn a joint and coordinated representation. The proposed technique of SSL learns by the collaborative power of different learning objectives of SSL. Hence to jointly learn the different SSL objectives we proposed a new SSL architecture KDC-MAE, a complementary masking strategy to learn the modular correspondence, and a weighted way to combine them coordinately. Experimental results conclude that the contrastive masking correspondence along with the KD learning objective has lent a hand to performing better learning for multiple modalities over multiple tasks.

Via

Access Paper or Ask Questions

Enhancing 3D-Air Signature by Pen Tip Tail Trajectory Awareness: Dataset and Featuring by Novel Spatio-temporal CNN

Jan 05, 2024

Saurabh Atreya, Maheswar Bora, Aritra Mukherjee, Abhijit Das

Abstract:This work proposes a novel process of using pen tip and tail 3D trajectory for air signature. To acquire the trajectories we developed a new pen tool and a stereo camera was used. We proposed SliT-CNN, a novel 2D spatial-temporal convolutional neural network (CNN) for better featuring of the air signature. In addition, we also collected an air signature dataset from $45$ signers. Skilled forgery signatures per user are also collected. A detailed benchmarking of the proposed dataset using existing techniques and proposed CNN on existing and proposed dataset exhibit the effectiveness of our methodology.

* Accepted and presented in IJCB 2023

Via

Access Paper or Ask Questions