Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michele Mazzamuto

Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze

Jun 12, 2024

Michele Mazzamuto, Antonino Furnari, Giovanni Maria Farinella

Figure 1 for Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze

Figure 2 for Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze

Figure 3 for Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze

Figure 4 for Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze

Abstract:In this paper, we address the challenge of unsupervised mistake detection in egocentric video through the analysis of gaze signals, a critical component for advancing user assistance in smart glasses. Traditional supervised methods, reliant on manually labeled mistakes, suffer from domain-dependence and scalability issues. This research introduces an unsupervised method for detecting mistakes in videos of human activities, overcoming the challenges of domain-specific requirements and the necessity for annotated data. By analyzing unusual gaze patterns that signal user disorientation during tasks, we propose a gaze completion model that forecasts eye gaze trajectories from incomplete inputs. The difference between the anticipated and observed gaze paths acts as an indicator for identifying errors. Our method is validated on the EPIC-Tent dataset, showing its superiority compared to current one-class supervised and unsupervised techniques.

Via

Access Paper or Ask Questions

ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios

Sep 26, 2023

Francesco Ragusa, Rosario Leonardi, Michele Mazzamuto, Claudia Bonanno, Rosario Scavo, Antonino Furnari, Giovanni Maria Farinella

Abstract:ENIGMA-51 is a new egocentric dataset acquired in a real industrial domain by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e.g., electric screwdriver) and electronic instruments (e.g., oscilloscope). The 51 sequences are densely annotated with a rich set of labels that enable the systematic study of human-object interactions in the industrial domain. We provide benchmarks on four tasks related to human-object interactions: 1) untrimmed action detection, 2) egocentric human-object interaction detection, 3) short-term object interaction anticipation and 4) natural language understanding of intents and entities. Baseline results show that the ENIGMA-51 dataset poses a challenging benchmark to study human-object interactions in industrial scenarios. We publicly release the dataset at: https://iplab.dmi.unict.it/ENIGMA-51/.

Via

Access Paper or Ask Questions

Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

Apr 14, 2022

Michele Mazzamuto, Francesco Ragusa, Antonino Furnari, Giovanni Signorello, Giovanni Maria Farinella

Figure 1 for Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

Figure 2 for Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

Figure 3 for Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

Figure 4 for Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

Abstract:We consider the problem of detecting and recognizing the objects observed by visitors (i.e., attended objects) in cultural sites from egocentric vision. A standard approach to the problem involves detecting all objects and selecting the one which best overlaps with the gaze of the visitor, measured through a gaze tracker. Since labeling large amounts of data to train a standard object detector is expensive in terms of costs and time, we propose a weakly supervised version of the task which leans only on gaze data and a frame-level label indicating the class of the attended object. To study the problem, we present a new dataset composed of egocentric videos and gaze coordinates of subjects visiting a museum. We hence compare three different baselines for weakly supervised attended object detection on the collected data. Results show that the considered approaches achieve satisfactory performance in a weakly supervised manner, which allows for significant time savings with respect to a fully supervised detector based on Faster R-CNN. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/WS_OBJ_DET/

Via

Access Paper or Ask Questions