Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Archan Misra

Inference-Time Gaze Refinement for Micro-Expression Recognition: Enhancing Event-Based Eye Tracking with Motion-Aware Post-Processing

Jun 14, 2025

Nuwan Bandara, Thivya Kandappu, Archan Misra

Abstract:Event-based eye tracking holds significant promise for fine-grained cognitive state inference, offering high temporal resolution and robustness to motion artifacts, critical features for decoding subtle mental states such as attention, confusion, or fatigue. In this work, we introduce a model-agnostic, inference-time refinement framework designed to enhance the output of existing event-based gaze estimation models without modifying their architecture or requiring retraining. Our method comprises two key post-processing modules: (i) Motion-Aware Median Filtering, which suppresses blink-induced spikes while preserving natural gaze dynamics, and (ii) Optical Flow-Based Local Refinement, which aligns gaze predictions with cumulative event motion to reduce spatial jitter and temporal discontinuities. To complement traditional spatial accuracy metrics, we propose a novel Jitter Metric that captures the temporal smoothness of predicted gaze trajectories based on velocity regularity and local signal complexity. Together, these contributions significantly improve the consistency of event-based gaze signals, making them better suited for downstream tasks such as micro-expression analysis and mind-state decoding. Our results demonstrate consistent improvements across multiple baseline models on controlled datasets, laying the groundwork for future integration with multimodal affect recognition systems in real-world environments.

* 18 pages

Via

Access Paper or Ask Questions

Event-Based Eye Tracking. 2025 Event-based Vision Workshop

Apr 25, 2025

Qinyu Chen, Chang Gao, Min Liu, Daniele Perrone, Yan Ru Pei, Zuowen Wang, Zhuo Zou, Shihang Tan, Tao Han, Guorui Lu(+22 more)

Abstract:This survey serves as a review for the 2025 Event-Based Eye Tracking Challenge organized as part of the 2025 CVPR event-based vision workshop. This challenge focuses on the task of predicting the pupil center by processing event camera recorded eye movement. We review and summarize the innovative methods from teams rank the top in the challenge to advance future event-based eye tracking research. In each method, accuracy, model size, and number of operations are reported. In this survey, we also discuss event-based eye tracking from the perspective of hardware design.

Via

Access Paper or Ask Questions

NTIRE 2025 Challenge on Event-Based Image Deblurring: Methods and Results

Apr 16, 2025

Lei Sun, Andrea Alfarano, Peiqi Duan, Shaolin Su, Kaiwei Wang, Boxin Shi, Radu Timofte, Danda Pani Paudel, Luc Van Gool, Qinglin Liu(+78 more)

Abstract:This paper presents an overview of NTIRE 2025 the First Challenge on Event-Based Image Deblurring, detailing the proposed methodologies and corresponding results. The primary goal of the challenge is to design an event-based method that achieves high-quality image deblurring, with performance quantitatively assessed using Peak Signal-to-Noise Ratio (PSNR). Notably, there are no restrictions on computational complexity or model size. The task focuses on leveraging both events and images as inputs for single-image deblurring. A total of 199 participants registered, among whom 15 teams successfully submitted valid results, offering valuable insights into the current state of event-based image deblurring. We anticipate that this challenge will drive further advancements in event-based vision research.

Via

Access Paper or Ask Questions

Ges3ViG: Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding

Apr 13, 2025

Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju, Sougata Sen, Sanjay E. Sarma, Archan Misra

Abstract:3-Dimensional Embodied Reference Understanding (3D-ERU) combines a language description and an accompanying pointing gesture to identify the most relevant target object in a 3D scene. Although prior work has explored pure language-based 3D grounding, there has been limited exploration of 3D-ERU, which also incorporates human pointing gestures. To address this gap, we introduce a data augmentation framework-Imputer, and use it to curate a new benchmark dataset-ImputeRefer for 3D-ERU, by incorporating human pointing gestures into existing 3D scene datasets that only contain language instructions. We also propose Ges3ViG, a novel model for 3D-ERU that achieves ~30% improvement in accuracy as compared to other 3D-ERU models and ~9% compared to other purely language-based 3D grounding models. Our code and dataset are available at https://github.com/AtharvMane/Ges3ViG.

* Accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

Via

Access Paper or Ask Questions

EyeTrAES: Fine-grained, Low-Latency Eye Tracking via Adaptive Event Slicing

Sep 27, 2024

Argha Sen, Nuwan Bandara, Ila Gokarn, Thivya Kandappu, Archan Misra

Figure 1 for EyeTrAES: Fine-grained, Low-Latency Eye Tracking via Adaptive Event Slicing

Figure 2 for EyeTrAES: Fine-grained, Low-Latency Eye Tracking via Adaptive Event Slicing

Figure 3 for EyeTrAES: Fine-grained, Low-Latency Eye Tracking via Adaptive Event Slicing

Figure 4 for EyeTrAES: Fine-grained, Low-Latency Eye Tracking via Adaptive Event Slicing

Abstract:Eye-tracking technology has gained significant attention in recent years due to its wide range of applications in human-computer interaction, virtual and augmented reality, and wearable health. Traditional RGB camera-based eye-tracking systems often struggle with poor temporal resolution and computational constraints, limiting their effectiveness in capturing rapid eye movements. To address these limitations, we propose EyeTrAES, a novel approach using neuromorphic event cameras for high-fidelity tracking of natural pupillary movement that shows significant kinematic variance. One of EyeTrAES's highlights is the use of a novel adaptive windowing/slicing algorithm that ensures just the right amount of descriptive asynchronous event data accumulation within an event frame, across a wide range of eye movement patterns. EyeTrAES then applies lightweight image processing functions over accumulated event frames from just a single eye to perform pupil segmentation and tracking. We show that these methods boost pupil tracking fidelity by 6+%, achieving IoU~=92%, while incurring at least 3x lower latency than competing pure event-based eye tracking alternatives [38]. We additionally demonstrate that the microscopic pupillary motion captured by EyeTrAES exhibits distinctive variations across individuals and can thus serve as a biometric fingerprint. For robust user authentication, we train a lightweight per-user Random Forest classifier using a novel feature vector of short-term pupillary kinematics, comprising a sliding window of pupil (location, velocity, acceleration) triples. Experimental studies with two different datasets demonstrate that the EyeTrAES-based authentication technique can simultaneously achieve high authentication accuracy (~=0.82) and low processing latency (~=12ms), and significantly outperform multiple state-of-the-art competitive baselines.

* 32 pages,15 figures,

Via

Access Paper or Ask Questions

Demo: RhythmEdge: Enabling Contactless Heart Rate Estimation on the Edge

Aug 13, 2022

Zahid Hasan, Emon Dey, Sreenivasan Ramasamy Ramamurthy, Nirmalya Roy, Archan Misra

Figure 1 for Demo: RhythmEdge: Enabling Contactless Heart Rate Estimation on the Edge

Figure 2 for Demo: RhythmEdge: Enabling Contactless Heart Rate Estimation on the Edge

Abstract:In this demo paper, we design and prototype RhythmEdge, a low-cost, deep-learning-based contact-less system for regular HR monitoring applications. RhythmEdge benefits over existing approaches by facilitating contact-less nature, real-time/offline operation, inexpensive and available sensing components, and computing devices. Our RhythmEdge system is portable and easily deployable for reliable HR estimation in moderately controlled indoor or outdoor environments. RhythmEdge measures HR via detecting changes in blood volume from facial videos (Remote Photoplethysmography; rPPG) and provides instant assessment using off-the-shelf commercially available resource-constrained edge platforms and video cameras. We demonstrate the scalability, flexibility, and compatibility of the RhythmEdge by deploying it on three resource-constrained platforms of differing architectures (NVIDIA Jetson Nano, Google Coral Development Board, Raspberry Pi) and three heterogeneous cameras of differing sensitivity, resolution, properties (web camera, action camera, and DSLR). RhythmEdge further stores longitudinal cardiovascular information and provides instant notification to the users. We thoroughly test the prototype stability, latency, and feasibility for three edge computing platforms by profiling their runtime, memory, and power usage.

Via

Access Paper or Ask Questions

DeepLight: Robust & Unobtrusive Real-time Screen-Camera Communication for Real-World Displays

May 11, 2021

Vu Tran, Gihan Jayatilaka, Ashwin Ashok, Archan Misra

Figure 1 for DeepLight: Robust & Unobtrusive Real-time Screen-Camera Communication for Real-World Displays

Figure 2 for DeepLight: Robust & Unobtrusive Real-time Screen-Camera Communication for Real-World Displays

Figure 3 for DeepLight: Robust & Unobtrusive Real-time Screen-Camera Communication for Real-World Displays

Figure 4 for DeepLight: Robust & Unobtrusive Real-time Screen-Camera Communication for Real-World Displays

Abstract:The paper introduces a novel, holistic approach for robust Screen-Camera Communication (SCC), where video content on a screen is visually encoded in a human-imperceptible fashion and decoded by a camera capturing images of such screen content. We first show that state-of-the-art SCC techniques have two key limitations for in-the-wild deployment: (a) the decoding accuracy drops rapidly under even modest screen extraction errors from the captured images, and (b) they generate perceptible flickers on common refresh rate screens even with minimal modulation of pixel intensity. To overcome these challenges, we introduce DeepLight, a system that incorporates machine learning (ML) models in the decoding pipeline to achieve humanly-imperceptible, moderately high SCC rates under diverse real-world conditions. Deep-Light's key innovation is the design of a Deep Neural Network (DNN) based decoder that collectively decodes all the bits spatially encoded in a display frame, without attempting to precisely isolate the pixels associated with each encoded bit. In addition, DeepLight supports imperceptible encoding by selectively modulating the intensity of only the Blue channel, and provides reasonably accurate screen extraction (IoU values >= 83%) by using state-of-the-art object detection DNN pipelines. We show that a fully functional DeepLight system is able to robustly achieve high decoding accuracy (frame error rate < 0.2) and moderately-high data goodput (>=0.95Kbps) using a human-held smartphone camera, even over larger screen-camera distances (approx =2m).

* Accepted for IPSN 2021 (ACM/IEEE International Conference on Information Processing in Sensor Networks 2021)

Via

Access Paper or Ask Questions

Enabling Collaborative Video Sensing at the Edge through Convolutional Sharing

Dec 03, 2020

Kasthuri Jayarajah, Dhanuja Wanniarachchige, Archan Misra

Figure 1 for Enabling Collaborative Video Sensing at the Edge through Convolutional Sharing

Figure 2 for Enabling Collaborative Video Sensing at the Edge through Convolutional Sharing

Figure 3 for Enabling Collaborative Video Sensing at the Edge through Convolutional Sharing

Figure 4 for Enabling Collaborative Video Sensing at the Edge through Convolutional Sharing

Abstract:While Deep Neural Network (DNN) models have provided remarkable advances in machine vision capabilities, their high computational complexity and model sizes present a formidable roadblock to deployment in AIoT-based sensing applications. In this paper, we propose a novel paradigm by which peer nodes in a network can collaborate to improve their accuracy on person detection, an exemplar machine vision task. The proposed methodology requires no re-training of the DNNs and incurs minimal processing latency as it extracts scene summaries from the collaborators and injects back into DNNs of the reference cameras, on-the-fly. Early results show promise with improvements in recall as high as 10% with a single collaborator, on benchmark datasets.

Via

Access Paper or Ask Questions

Prior Activation Distribution (PAD): A Versatile Representation to Utilize DNN Hidden Units

Jul 05, 2019

Lakmal Meegahapola, Vengateswaran Subramaniam, Lance Kaplan, Archan Misra

Figure 1 for Prior Activation Distribution (PAD): A Versatile Representation to Utilize DNN Hidden Units

Figure 2 for Prior Activation Distribution (PAD): A Versatile Representation to Utilize DNN Hidden Units

Figure 3 for Prior Activation Distribution (PAD): A Versatile Representation to Utilize DNN Hidden Units

Figure 4 for Prior Activation Distribution (PAD): A Versatile Representation to Utilize DNN Hidden Units

Abstract:In this paper, we introduce the concept of Prior Activation Distribution (PAD) as a versatile and general technique to capture the typical activation patterns of hidden layer units of a Deep Neural Network used for classification tasks. We show that the combined neural activations of such a hidden layer have class-specific distributional properties, and then define multiple statistical measures to compute how far a test sample's activations deviate from such distributions. Using a variety of benchmark datasets (including MNIST, CIFAR10, Fashion-MNIST & notMNIST), we show how such PAD-based measures can be used, independent of any training technique, to (a) derive fine-grained uncertainty estimates for inferences; (b) provide inferencing accuracy competitive with alternatives that require execution of the full pipeline, and (c) reliably isolate out-of-distribution test samples.

* Submitted to NeurIPS 2019

Via

Access Paper or Ask Questions

BreathRNNet: Breathing Based Authentication on Resource-Constrained IoT Devices using RNNs

Sep 22, 2017

Jagmohan Chauhan, Suranga Seneviratne, Yining Hu, Archan Misra, Aruna Seneviratne, Youngki Lee

Figure 1 for BreathRNNet: Breathing Based Authentication on Resource-Constrained IoT Devices using RNNs

Figure 2 for BreathRNNet: Breathing Based Authentication on Resource-Constrained IoT Devices using RNNs

Figure 3 for BreathRNNet: Breathing Based Authentication on Resource-Constrained IoT Devices using RNNs

Figure 4 for BreathRNNet: Breathing Based Authentication on Resource-Constrained IoT Devices using RNNs

Abstract:Recurrent neural networks (RNNs) have shown promising results in audio and speech processing applications due to their strong capabilities in modelling sequential data. In many applications, RNNs tend to outperform conventional models based on GMM/UBMs and i-vectors. Increasing popularity of IoT devices makes a strong case for implementing RNN based inferences for applications such as acoustics based authentication, voice commands, and edge analytics for smart homes. Nonetheless, the feasibility and performance of RNN based inferences on resources-constrained IoT devices remain largely unexplored. In this paper, we investigate the feasibility of using RNNs for an end-to-end authentication system based on breathing acoustics. We evaluate the performance of RNN models on three types of devices; smartphone, smartwatch, and Raspberry Pi and show that unlike CNN models, RNN models can be easily ported onto resource-constrained devices without a significant loss in accuracy.

Via

Access Paper or Ask Questions