Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Prarthana Bhattacharyya

Helios 2.0: A Robust, Ultra-Low Power Gesture Recognition System Optimised for Event-Sensor based Wearables

Mar 10, 2025

Prarthana Bhattacharyya, Joshua Mitton, Ryan Page, Owen Morgan, Oliver Powell, Benjamin Menzies, Gabriel Homewood, Kemi Jacobs, Paolo Baesso, Taru Muhonen(+2 more)

Figure 1 for Helios 2.0: A Robust, Ultra-Low Power Gesture Recognition System Optimised for Event-Sensor based Wearables

Figure 2 for Helios 2.0: A Robust, Ultra-Low Power Gesture Recognition System Optimised for Event-Sensor based Wearables

Figure 3 for Helios 2.0: A Robust, Ultra-Low Power Gesture Recognition System Optimised for Event-Sensor based Wearables

Figure 4 for Helios 2.0: A Robust, Ultra-Low Power Gesture Recognition System Optimised for Event-Sensor based Wearables

Abstract:We present an advance in wearable technology: a mobile-optimized, real-time, ultra-low-power event camera system that enables natural hand gesture control for smart glasses, dramatically improving user experience. While hand gesture recognition in computer vision has advanced significantly, critical challenges remain in creating systems that are intuitive, adaptable across diverse users and environments, and energy-efficient enough for practical wearable applications. Our approach tackles these challenges through carefully selected microgestures: lateral thumb swipes across the index finger (in both directions) and a double pinch between thumb and index fingertips. These human-centered interactions leverage natural hand movements, ensuring intuitive usability without requiring users to learn complex command sequences. To overcome variability in users and environments, we developed a novel simulation methodology that enables comprehensive domain sampling without extensive real-world data collection. Our power-optimised architecture maintains exceptional performance, achieving F1 scores above 80\% on benchmark datasets featuring diverse users and environments. The resulting models operate at just 6-8 mW when exploiting the Qualcomm Snapdragon Hexagon DSP, with our 2-channel implementation exceeding 70\% F1 accuracy and our 6-channel model surpassing 80\% F1 accuracy across all gesture classes in user studies. These results were achieved using only synthetic training data. This improves on the state-of-the-art for F1 accuracy by 20\% with a power reduction 25x when using DSP. This advancement brings deploying ultra-low-power vision systems in wearable devices closer and opens new possibilities for seamless human-computer interaction.

* 15 pages, 17 figures. Prarthana Bhattacharyya, Joshua Mitton, Ryan Page, Owen Morgan, and Oliver Powell contributed equally to this paper

Via

Access Paper or Ask Questions

Helios: An extremely low power event-based gesture recognition for always-on smart eyewear

Jul 11, 2024

Prarthana Bhattacharyya, Joshua Mitton, Ryan Page, Owen Morgan, Ben Menzies, Gabriel Homewood, Kemi Jacobs, Paolo Baesso, Dave Trickett, Chris Mair(+5 more)

Figure 1 for Helios: An extremely low power event-based gesture recognition for always-on smart eyewear

Figure 2 for Helios: An extremely low power event-based gesture recognition for always-on smart eyewear

Figure 3 for Helios: An extremely low power event-based gesture recognition for always-on smart eyewear

Figure 4 for Helios: An extremely low power event-based gesture recognition for always-on smart eyewear

Abstract:This paper introduces Helios, the first extremely low-power, real-time, event-based hand gesture recognition system designed for all-day on smart eyewear. As augmented reality (AR) evolves, current smart glasses like the Meta Ray-Bans prioritize visual and wearable comfort at the expense of functionality. Existing human-machine interfaces (HMIs) in these devices, such as capacitive touch and voice controls, present limitations in ergonomics, privacy and power consumption. Helios addresses these challenges by leveraging natural hand interactions for a more intuitive and comfortable user experience. Our system utilizes a extremely low-power and compact 3mmx4mm/20mW event camera to perform natural hand-based gesture recognition for always-on smart eyewear. The camera's output is processed by a convolutional neural network (CNN) running on a NXP Nano UltraLite compute platform, consuming less than 350mW. Helios can recognize seven classes of gestures, including subtle microgestures like swipes and pinches, with 91% accuracy. We also demonstrate real-time performance across 20 users at a remarkably low latency of 60ms. Our user testing results align with the positive feedback we received during our recent successful demo at AWE-USA-2024.

* 18 pages, 10 figures. First three authors contributed equally to this paper

Via

Access Paper or Ask Questions

SSL-Interactions: Pretext Tasks for Interactive Trajectory Prediction

Jan 15, 2024

Prarthana Bhattacharyya, Chengjie Huang, Krzysztof Czarnecki

Figure 1 for SSL-Interactions: Pretext Tasks for Interactive Trajectory Prediction

Figure 2 for SSL-Interactions: Pretext Tasks for Interactive Trajectory Prediction

Figure 3 for SSL-Interactions: Pretext Tasks for Interactive Trajectory Prediction

Figure 4 for SSL-Interactions: Pretext Tasks for Interactive Trajectory Prediction

Abstract:This paper addresses motion forecasting in multi-agent environments, pivotal for ensuring safety of autonomous vehicles. Traditional as well as recent data-driven marginal trajectory prediction methods struggle to properly learn non-linear agent-to-agent interactions. We present SSL-Interactions that proposes pretext tasks to enhance interaction modeling for trajectory prediction. We introduce four interaction-aware pretext tasks to encapsulate various aspects of agent interactions: range gap prediction, closest distance prediction, direction of movement prediction, and type of interaction prediction. We further propose an approach to curate interaction-heavy scenarios from datasets. This curated data has two advantages: it provides a stronger learning signal to the interaction model, and facilitates generation of pseudo-labels for interaction-centric pretext tasks. We also propose three new metrics specifically designed to evaluate predictions in interactive scenes. Our empirical evaluations indicate SSL-Interactions outperforms state-of-the-art motion forecasting methods quantitatively with up to 8% improvement, and qualitatively, for interaction-heavy scenarios.

* 13 pages, 5 figures, submitted to IV-2024

Via

Access Paper or Ask Questions

SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving

Jun 28, 2022

Prarthana Bhattacharyya, Chengjie Huang, Krzysztof Czarnecki

Figure 1 for SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving

Figure 2 for SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving

Figure 3 for SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving

Figure 4 for SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving

Abstract:Self-supervised learning (SSL) is an emerging technique that has been successfully employed to train convolutional neural networks (CNNs) and graph neural networks (GNNs) for more transferable, generalizable, and robust representation learning. However its potential in motion forecasting for autonomous driving has rarely been explored. In this study, we report the first systematic exploration and assessment of incorporating self-supervision into motion forecasting. We first propose to investigate four novel self-supervised learning tasks for motion forecasting with theoretical rationale and quantitative and qualitative comparisons on the challenging large-scale Argoverse dataset. Secondly, we point out that our auxiliary SSL-based learning setup not only outperforms forecasting methods which use transformers, complicated fusion mechanisms and sophisticated online dense goal candidate optimization algorithms in terms of performance accuracy, but also has low inference time and architectural complexity. Lastly, we conduct several experiments to understand why SSL improves motion forecasting. Code is open-sourced at \url{https://github.com/AutoVision-cloud/SSL-Lanes}.

* 16 pages, 7 figures

Via

Access Paper or Ask Questions

Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Jan 30, 2022

Prarthana Bhattacharyya, Chenge Li, Xiaonan Zhao, István Fehérvári, Jason Sun

Figure 1 for Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Figure 2 for Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Figure 3 for Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Figure 4 for Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Abstract:Self-supervision has shown outstanding results for natural language processing, and more recently, for image recognition. Simultaneously, vision transformers and its variants have emerged as a promising and scalable alternative to convolutions on various computer vision tasks. In this paper, we are the first to question if self-supervised vision transformers (SSL-ViTs) can be adapted to two important computer vision tasks in the low-label, high-data regime: few-shot image classification and zero-shot image retrieval. The motivation is to reduce the number of manual annotations required to train a visual embedder, and to produce generalizable and semantically meaningful embeddings. For few-shot image classification we train SSL-ViTs without any supervision, on external data, and use this trained embedder to adapt quickly to novel classes with limited number of labels. For zero-shot image retrieval, we use SSL-ViTs pre-trained on a large dataset without any labels and fine-tune them with several metric learning objectives. Our self-supervised attention representations outperforms the state-of-the-art on several public benchmarks for both tasks, namely miniImageNet and CUB200 for few-shot image classification by up-to 6%-10%, and Stanford Online Products, Cars196 and CUB200 for zero-shot image retrieval by up-to 4%-11%. Code is available at \url{https://github.com/AutoVision-cloud/SSL-ViT-lowlabel-highdata}.

* Accepted to ICASSP-2022

Via

Access Paper or Ask Questions

3D Scene Understanding at Urban Intersection using Stereo Vision and Digital Map

Dec 10, 2021

Prarthana Bhattacharyya, Yanlei Gu, Jiali Bao, Xu Liu, Shunsuke Kamijo

Figure 1 for 3D Scene Understanding at Urban Intersection using Stereo Vision and Digital Map

Figure 2 for 3D Scene Understanding at Urban Intersection using Stereo Vision and Digital Map

Figure 3 for 3D Scene Understanding at Urban Intersection using Stereo Vision and Digital Map

Figure 4 for 3D Scene Understanding at Urban Intersection using Stereo Vision and Digital Map

Abstract:The driving behavior at urban intersections is very complex. It is thus crucial for autonomous vehicles to comprehensively understand challenging urban traffic scenes in order to navigate intersections and prevent accidents. In this paper, we introduce a stereo vision and 3D digital map based approach to spatially and temporally analyze the traffic situation at urban intersections. Stereo vision is used to detect, classify and track obstacles, while a 3D digital map is used to improve ego-localization and provide context in terms of road-layout information. A probabilistic approach that temporally integrates these geometric, semantic, dynamic and contextual cues is presented. We qualitatively and quantitatively evaluate our proposed technique on real traffic data collected at an urban canyon in Tokyo to demonstrate the efficacy of the system in providing comprehensive awareness of the traffic surroundings.

* 2017 IEEE 85th Vehicular Technology Conference (VTC Spring)
* 6 pages, 6 figures

Via

Access Paper or Ask Questions

Self-Attention Based Context-Aware 3D Object Detection

Jan 07, 2021

Prarthana Bhattacharyya, Chengjie Huang, Krzysztof Czarnecki

Figure 1 for Self-Attention Based Context-Aware 3D Object Detection

Figure 2 for Self-Attention Based Context-Aware 3D Object Detection

Figure 3 for Self-Attention Based Context-Aware 3D Object Detection

Figure 4 for Self-Attention Based Context-Aware 3D Object Detection

Abstract:Most existing point-cloud based 3D object detectors use convolution-like operators to process information in a local neighbourhood with fixed-weight kernels and aggregate global context hierarchically. However, recent work on non-local neural networks and self-attention for 2D vision has shown that explicitly modeling global context and long-range interactions between positions can lead to more robust and competitive models. In this paper, we explore two variants of self-attention for contextual modeling in 3D object detection by augmenting convolutional features with self-attention features. We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors and show consistent improvement over strong baseline models while simultaneously significantly reducing their parameter footprint and computational cost. We also propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations. This not only allows us to scale explicit global contextual modeling to larger point-clouds, but also leads to more discriminative and informative feature descriptors. Our method can be flexibly applied to most state-of-the-art detectors with increased accuracy and parameter and compute efficiency. We achieve new state-of-the-art detection performance on KITTI and nuScenes datasets. Code is available at \url{https://github.com/AutoVision-cloud/SA-Det3D}.

* 17 pages, 9 figures

Via

Access Paper or Ask Questions

Deformable PV-RCNN: Improving 3D Object Detection with Learned Deformations

Aug 20, 2020

Prarthana Bhattacharyya, Krzysztof Czarnecki

Figure 1 for Deformable PV-RCNN: Improving 3D Object Detection with Learned Deformations

Figure 2 for Deformable PV-RCNN: Improving 3D Object Detection with Learned Deformations

Figure 3 for Deformable PV-RCNN: Improving 3D Object Detection with Learned Deformations

Figure 4 for Deformable PV-RCNN: Improving 3D Object Detection with Learned Deformations

Abstract:We present Deformable PV-RCNN, a high-performing point-cloud based 3D object detector. Currently, the proposal refinement methods used by the state-of-the-art two-stage detectors cannot adequately accommodate differing object scales, varying point-cloud density, part-deformation and clutter. We present a proposal refinement module inspired by 2D deformable convolution networks that can adaptively gather instance-specific features from locations where informative content exists. We also propose a simple context gating mechanism which allows the keypoints to select relevant context information for the refinement stage. We show state-of-the-art results on the KITTI dataset.

* Accepted at ECCV 2020 Workshop on Perception for Autonomous Driving

Via

Access Paper or Ask Questions

FANTrack: 3D Multi-Object Tracking with Feature Association Network

May 07, 2019

Erkan Baser, Venkateshwaran Balasubramanian, Prarthana Bhattacharyya, Krzysztof Czarnecki

Figure 1 for FANTrack: 3D Multi-Object Tracking with Feature Association Network

Figure 2 for FANTrack: 3D Multi-Object Tracking with Feature Association Network

Figure 3 for FANTrack: 3D Multi-Object Tracking with Feature Association Network

Figure 4 for FANTrack: 3D Multi-Object Tracking with Feature Association Network

Abstract:We propose a data-driven approach to online multi-object tracking (MOT) that uses a convolutional neural network (CNN) for data association in a tracking-by-detection framework. The problem of multi-target tracking aims to assign noisy detections to a-priori unknown and time-varying number of tracked objects across a sequence of frames. A majority of the existing solutions focus on either tediously designing cost functions or formulating the task of data association as a complex optimization problem that can be solved effectively. Instead, we exploit the power of deep learning to formulate the data association problem as inference in a CNN. To this end, we propose to learn a similarity function that combines cues from both image and spatial features of objects. Our solution learns to perform global assignments in 3D purely from data, handles noisy detections and a varying number of targets, and is easy to train. We evaluate our approach on the challenging KITTI dataset and show competitive results. Our code is available at https://git.uwaterloo.ca/wise-lab/fantrack.

* 8 pages, 10 figures, IEEE Intelligent Vehicles Symposium (IV 19)

Via

Access Paper or Ask Questions