Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sanjeev J. Koppal

ClipRover: Zero-shot Vision-Language Exploration and Target Discovery by Mobile Robots

Feb 12, 2025

Yuxuan Zhang, Adnan Abdullah, Sanjeev J. Koppal, Md Jahidul Islam

Abstract:Vision-language navigation (VLN) has emerged as a promising paradigm, enabling mobile robots to perform zero-shot inference and execute tasks without specific pre-programming. However, current systems often separate map exploration and path planning, with exploration relying on inefficient algorithms due to limited (partially observed) environmental information. In this paper, we present a novel navigation pipeline named ''ClipRover'' for simultaneous exploration and target discovery in unknown environments, leveraging the capabilities of a vision-language model named CLIP. Our approach requires only monocular vision and operates without any prior map or knowledge about the target. For comprehensive evaluations, we design the functional prototype of a UGV (unmanned ground vehicle) system named ''Rover Master'', a customized platform for general-purpose VLN tasks. We integrate and deploy the ClipRover pipeline on Rover Master to evaluate its throughput, obstacle avoidance capability, and trajectory performance across various real-world scenarios. Experimental results demonstrate that ClipRover consistently outperforms traditional map traversal algorithms and achieves performance comparable to path-planning methods that depend on prior map and target knowledge. Notably, ClipRover offers real-time active navigation without requiring pre-captured candidate images or pre-built node graphs, addressing key limitations of existing VLN pipelines.

* V1, 21 pages

Via

Access Paper or Ask Questions

FoveaSPAD: Exploiting Depth Priors for Adaptive and Efficient Single-Photon 3D Imaging

Dec 03, 2024

Justin Folden, Atul Ingle, Sanjeev J. Koppal

Abstract:Fast, efficient, and accurate depth-sensing is important for safety-critical applications such as autonomous vehicles. Direct time-of-flight LiDAR has the potential to fulfill these demands, thanks to its ability to provide high-precision depth measurements at long standoff distances. While conventional LiDAR relies on avalanche photodiodes (APDs), single-photon avalanche diodes (SPADs) are an emerging image-sensing technology that offer many advantages such as extreme sensitivity and time resolution. In this paper, we remove the key challenges to widespread adoption of SPAD-based LiDARs: their susceptibility to ambient light and the large amount of raw photon data that must be processed to obtain in-pixel depth estimates. We propose new algorithms and sensing policies that improve signal-to-noise ratio (SNR) and increase computing and memory efficiency for SPAD-based LiDARs. During capture, we use external signals to \emph{foveate}, i.e., guide how the SPAD system estimates scene depths. This foveated approach allows our method to ``zoom into'' the signal of interest, reducing the amount of raw photon data that needs to be stored and transferred from the SPAD sensor, while also improving resilience to ambient light. We show results both in simulation and also with real hardware emulation, with specific implementations achieving a 1548-fold reduction in memory usage, and our algorithms can be applied to newly available and future SPAD arrays.

Via

Access Paper or Ask Questions

Cost-efficient Active Illumination Camera For Hyper-spectral Reconstruction

Jun 27, 2024

Yuxuan Zhang, T. M. Sazzad, Yangyang Song, Spencer J. Chang, Ritesh Chowdhry, Tomas Mejia, Anna Hampton, Shelby Kucharski, Stefan Gerber, Barry Tillman(+5 more)

Figure 1 for Cost-efficient Active Illumination Camera For Hyper-spectral Reconstruction

Figure 2 for Cost-efficient Active Illumination Camera For Hyper-spectral Reconstruction

Figure 3 for Cost-efficient Active Illumination Camera For Hyper-spectral Reconstruction

Figure 4 for Cost-efficient Active Illumination Camera For Hyper-spectral Reconstruction

Abstract:Hyper-spectral imaging has recently gained increasing attention for use in different applications, including agricultural investigation, ground tracking, remote sensing and many other. However, the high cost, large physical size and complicated operation process stop hyperspectral cameras from being employed for various applications and research fields. In this paper, we introduce a cost-efficient, compact and easy to use active illumination camera that may benefit many applications. We developed a fully functional prototype of such camera. With the hope of helping with agricultural research, we tested our camera for plant root imaging. In addition, a U-Net model for spectral reconstruction was trained by using a reference hyperspectral camera's data as ground truth and our camera's data as input. We demonstrated our camera's ability to obtain additional information over a typical RGB camera. In addition, the ability to reconstruct hyperspectral data from multi-spectral input makes our device compatible to models and algorithms developed for hyperspectral applications with no modifications required.

Via

Access Paper or Ask Questions

SpectralZoom: Efficient Segmentation with an Adaptive Hyperspectral Camera

Jun 06, 2024

Jackson Arnold, Sophia Rossi, Chloe Petrosino, Ethan Mitchell, Sanjeev J. Koppal

Figure 1 for SpectralZoom: Efficient Segmentation with an Adaptive Hyperspectral Camera

Figure 2 for SpectralZoom: Efficient Segmentation with an Adaptive Hyperspectral Camera

Figure 3 for SpectralZoom: Efficient Segmentation with an Adaptive Hyperspectral Camera

Figure 4 for SpectralZoom: Efficient Segmentation with an Adaptive Hyperspectral Camera

Abstract:Hyperspectral image segmentation is crucial for many fields such as agriculture, remote sensing, biomedical imaging, battlefield sensing and astronomy. However, the challenge of hyper and multi spectral imaging is its large data footprint. We propose both a novel camera design and a vision transformer-based (ViT) algorithm that alleviate both the captured data footprint and the computational load for hyperspectral segmentation. Our camera is able to adaptively sample image regions or patches at different resolutions, instead of capturing the entire hyperspectral cube at one high resolution. Our segmentation algorithm works in concert with the camera, applying ViT-based segmentation only to adaptively selected patches. We show results both in simulation and on a real hardware platform demonstrating both accurate segmentation results and reduced computational burden.

Via

Access Paper or Ask Questions

Schrödinger's Camera: First Steps Towards a Quantum-Based Privacy Preserving Camera

Mar 13, 2023

Hannah Kirkland, Sanjeev J. Koppal

Abstract:Privacy-preserving vision must overcome the dual challenge of utility and privacy. Too much anonymity renders the images useless, but too little privacy does not protect sensitive data. We propose a novel design for privacy preservation, where the imagery is stored in quantum states. In the future, this will be enabled by quantum imaging cameras, and, currently, storing very low resolution imagery in quantum states is possible. Quantum state imagery has the advantage of being both private and non-private till the point of measurement. This occurs even when images are manipulated, since every quantum action is fully reversible. We propose a control algorithm, based on double deep Q-learning, to learn how to anonymize the image before measurement. After learning, the RL weights are fixed, and new attack neural networks are trained from scratch to break the system's privacy. Although all our results are in simulation, we demonstrate, with these first steps, that it is possible to control both privacy and utility in a quantum-based manner.

Via

Access Paper or Ask Questions

Design of an Adaptive Lightweight LiDAR to Decouple Robot-Camera Geometry

Feb 28, 2023

Yuyang Chen, Dingkang Wang, Lenworth Thomas, Karthik Dantu, Sanjeev J. Koppal

Abstract:A fundamental challenge in robot perception is the coupling of the sensor pose and robot pose. This has led to research in active vision where robot pose is changed to reorient the sensor to areas of interest for perception. Further, egomotion such as jitter, and external effects such as wind and others affect perception requiring additional effort in software such as image stabilization. This effect is particularly pronounced in micro-air vehicles and micro-robots who typically are lighter and subject to larger jitter but do not have the computational capability to perform stabilization in real-time. We present a novel microelectromechanical (MEMS) mirror LiDAR system to change the field of view of the LiDAR independent of the robot motion. Our design has the potential for use on small, low-power systems where the expensive components of the LiDAR can be placed external to the small robot. We show the utility of our approach in simulation and on prototype hardware mounted on a UAV. We believe that this LiDAR and its compact movable scanning design provide mechanisms to decouple robot and sensor geometry allowing us to simplify robot perception. We also demonstrate examples of motion compensation using IMU and external odometry feedback in hardware.

Via

Access Paper or Ask Questions

SaccadeCam: Adaptive Visual Attention for Monocular Depth Sensing

Mar 26, 2021

Brevin Tilmon, Sanjeev J. Koppal

Figure 1 for SaccadeCam: Adaptive Visual Attention for Monocular Depth Sensing

Figure 2 for SaccadeCam: Adaptive Visual Attention for Monocular Depth Sensing

Figure 3 for SaccadeCam: Adaptive Visual Attention for Monocular Depth Sensing

Figure 4 for SaccadeCam: Adaptive Visual Attention for Monocular Depth Sensing

Abstract:Most monocular depth sensing methods use conventionally captured images that are created without considering scene content. In contrast, animal eyes have fast mechanical motions, called saccades, that control how the scene is imaged by the fovea, where resolution is highest. In this paper, we present the SaccadeCam framework for adaptively distributing resolution onto regions of interest in the scene. Our algorithm for adaptive resolution is a self-supervised network and we demonstrate results for end-to-end learning for monocular depth estimation. We also show preliminary results with a real SaccadeCam hardware prototype.

Via

Access Paper or Ask Questions

A MEMS-based Foveating LIDAR to enable Real-time Adaptive Depth Sensing

Mar 21, 2020

Francesco Pittaluga, Zaid Tasneem, Justin Folden, Brevin Tilmon, Ayan Chakrabarti, Sanjeev J. Koppal

Figure 1 for A MEMS-based Foveating LIDAR to enable Real-time Adaptive Depth Sensing

Figure 2 for A MEMS-based Foveating LIDAR to enable Real-time Adaptive Depth Sensing

Figure 3 for A MEMS-based Foveating LIDAR to enable Real-time Adaptive Depth Sensing

Figure 4 for A MEMS-based Foveating LIDAR to enable Real-time Adaptive Depth Sensing

Abstract:Most active depth sensors sample their visual field using a fixed pattern, decided by accuracy, speed and cost trade-offs, rather than scene content. However, a number of recent works have demonstrated that adapting measurement patterns to scene content can offer significantly better trade-offs. We propose a hardware LIDAR design that allows flexible real-time measurements according to dynamically specified measurement patterns. Our flexible depth sensor design consists of a controllable scanning LIDAR that can foveate, or increase resolution in regions of interest, and that can fully leverage the power of adaptive depth sensing. We describe our optical setup and calibration, which enables fast sparse depth measurements using a scanning MEMS (micro-electro mechanical) mirror. We validate the efficacy of our prototype LIDAR design by testing on over 75 static and dynamic scenes spanning a range of environments. We also show CNN-based depth-map completion of sparse measurements obtained by our sensor. Our experiments show that our sensor can realize adaptive depth sensing systems.

* 17 pages, 6 figures, project site: https://www.fpittaluga.com/adaptivelidar

Via

Access Paper or Ask Questions

Revealing Scenes by Inverting Structure from Motion Reconstructions

Apr 05, 2019

Francesco Pittaluga, Sanjeev J. Koppal, Sing Bing Kang, Sudipta N. Sinha

Figure 1 for Revealing Scenes by Inverting Structure from Motion Reconstructions

Figure 2 for Revealing Scenes by Inverting Structure from Motion Reconstructions

Figure 3 for Revealing Scenes by Inverting Structure from Motion Reconstructions

Figure 4 for Revealing Scenes by Inverting Structure from Motion Reconstructions

Abstract:Many 3D vision systems localize cameras within a scene using 3D point clouds. Such point clouds are often obtained using structure from motion (SfM), after which the images are discarded to preserve privacy. In this paper, we show, for the first time, that such point clouds retain enough information to reveal scene appearance and compromise privacy. We present a privacy attack that reconstructs color images of the scene from the point cloud. Our method is based on a cascaded U-Net that takes as input, a 2D multichannel image of the points rendered from a specific viewpoint containing point depth and optionally color and SIFT descriptors and outputs a color image of the scene from that viewpoint. Unlike previous feature inversion methods, we deal with highly sparse and irregular 2D point distributions and inputs where many point attributes are missing, namely keypoint orientation and scale, the descriptor image source and the 3D point visibility. We evaluate our attack algorithm on public datasets and analyze the significance of the point cloud attributes. Finally, we show that novel views can also be generated thereby enabling compelling virtual tours of the underlying scene.

* 10 pages, 8 figures, to be published in IEEE Conference on Computer Vision and Pattern Recognition 2019

Via

Access Paper or Ask Questions

Learning Privacy Preserving Encodings through Adversarial Training

Jun 13, 2018

Francesco Pittaluga, Sanjeev J. Koppal, Ayan Chakrabarti

Figure 1 for Learning Privacy Preserving Encodings through Adversarial Training

Figure 2 for Learning Privacy Preserving Encodings through Adversarial Training

Figure 3 for Learning Privacy Preserving Encodings through Adversarial Training

Figure 4 for Learning Privacy Preserving Encodings through Adversarial Training

Abstract:We present a framework to learn privacy-preserving encodings of images (or other high-dimensional data) to inhibit inference of a chosen private attribute. Rather than encoding a fixed dataset or inhibiting a fixed estimator, we aim to to learn an encoding function such that even after this function is fixed, an estimator with knowledge of the encoding is unable to learn to accurately predict the private attribute, when generalizing beyond a training set. We formulate this as adversarial optimization of an encoding function against a classifier for the private attribute, with both modeled as deep neural networks. We describe an optimization approach which successfully yields an encoder that permanently limits inference of the private attribute, while preserving either a generic notion of information, or the estimation of a different, desired, attribute. We experimentally validate the efficacy of our approach on private tasks of real-world complexity, by learning to prevent detection of scene classes from the Places-365 dataset.

Via

Access Paper or Ask Questions