Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gabriele Goletto

Egocentric zone-aware action recognition across environments

Sep 21, 2024

Simone Alberto Peirone, Gabriele Goletto, Mirco Planamente, Andrea Bottino, Barbara Caputo, Giuseppe Averta

Figure 1 for Egocentric zone-aware action recognition across environments

Figure 2 for Egocentric zone-aware action recognition across environments

Figure 3 for Egocentric zone-aware action recognition across environments

Figure 4 for Egocentric zone-aware action recognition across environments

Abstract:Human activities exhibit a strong correlation between actions and the places where these are performed, such as washing something at a sink. More specifically, in daily living environments we may identify particular locations, hereinafter named activity-centric zones, which may afford a set of homogeneous actions. Their knowledge can serve as a prior to favor vision models to recognize human activities. However, the appearance of these zones is scene-specific, limiting the transferability of this prior information to unfamiliar areas and domains. This problem is particularly relevant in egocentric vision, where the environment takes up most of the image, making it even more difficult to separate the action from the context. In this paper, we discuss the importance of decoupling the domain-specific appearance of activity-centric zones from their universal, domain-agnostic representations, and show how the latter can improve the cross-domain transferability of Egocentric Action Recognition (EAR) models. We validate our solution on the EPIC-Kitchens-100 and Argo1M datasets

* Project webpage: https://gabrielegoletto.github.io/EgoZAR/

Via

Access Paper or Ask Questions

AMEGO: Active Memory from long EGOcentric videos

Sep 17, 2024

Gabriele Goletto, Tushar Nagarajan, Giuseppe Averta, Dima Damen

Figure 1 for AMEGO: Active Memory from long EGOcentric videos

Figure 2 for AMEGO: Active Memory from long EGOcentric videos

Figure 3 for AMEGO: Active Memory from long EGOcentric videos

Figure 4 for AMEGO: Active Memory from long EGOcentric videos

Abstract:Egocentric videos provide a unique perspective into individuals' daily experiences, yet their unstructured nature presents challenges for perception. In this paper, we introduce AMEGO, a novel approach aimed at enhancing the comprehension of very-long egocentric videos. Inspired by the human's ability to maintain information from a single watching, AMEGO focuses on constructing a self-contained representations from one egocentric video, capturing key locations and object interactions. This representation is semantic-free and facilitates multiple queries without the need to reprocess the entire visual content. Additionally, to evaluate our understanding of very-long egocentric videos, we introduce the new Active Memories Benchmark (AMB), composed of more than 20K of highly challenging visual queries from EPIC-KITCHENS. These queries cover different levels of video reasoning (sequencing, concurrency and temporal grounding) to assess detailed video understanding capabilities. We showcase improved performance of AMEGO on AMB, surpassing other video QA baselines by a substantial margin.

* Accepted to ECCV 2024. Project webpage: https://gabrielegoletto.github.io/AMEGO/

Via

Access Paper or Ask Questions

EarthMatch: Iterative Coregistration for Fine-grained Localization of Astronaut Photography

May 08, 2024

Gabriele Berton, Gabriele Goletto, Gabriele Trivigno, Alex Stoken, Barbara Caputo, Carlo Masone

Abstract:Precise, pixel-wise geolocalization of astronaut photography is critical to unlocking the potential of this unique type of remotely sensed Earth data, particularly for its use in disaster management and climate change research. Recent works have established the Astronaut Photography Localization task, but have either proved too costly for mass deployment or generated too coarse a localization. Thus, we present EarthMatch, an iterative homography estimation method that produces fine-grained localization of astronaut photographs while maintaining an emphasis on speed. We refocus the astronaut photography benchmark, AIMS, on the geolocalization task itself, and prove our method's efficacy on this dataset. In addition, we offer a new, fair method for image matcher comparison, and an extensive evaluation of different matching models within our localization pipeline. Our method will enable fast and accurate localization of the 4.5 million and growing collection of astronaut photography of Earth. Webpage with code and data at https://earthloc-and-earthmatch.github.io

* CVPR 2024 IMW - webpage: https://earthloc-and-earthmatch.github.io

Via

Access Paper or Ask Questions

An Outlook into the Future of Egocentric Vision

Aug 14, 2023

Chiara Plizzari, Gabriele Goletto, Antonino Furnari, Siddhant Bansal, Francesco Ragusa, Giovanni Maria Farinella, Dima Damen, Tatiana Tommasi

Figure 1 for An Outlook into the Future of Egocentric Vision

Figure 2 for An Outlook into the Future of Egocentric Vision

Figure 3 for An Outlook into the Future of Egocentric Vision

Figure 4 for An Outlook into the Future of Egocentric Vision

Abstract:What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through examples the limitations of current technology. We then provide a mapping between this future and previously defined research tasks. For each task, we survey its seminal works, current state-of-the-art methodologies and available datasets, then reflect on shortcomings that limit its applicability to future research. Note that this survey focuses on software models for egocentric vision, independent of any specific hardware. The paper concludes with recommendations for areas of immediate explorations so as to unlock our path to the future always-on, personalised and life-enhancing egocentric vision.

* We invite comments, suggestions and corrections here: https://openreview.net/forum?id=V3974SUk1w

Via

Access Paper or Ask Questions

Bringing Online Egocentric Action Recognition into the wild

Nov 06, 2022

Gabriele Goletto, Mirco Planamente, Barbara Caputo, Giuseppe Averta

Figure 1 for Bringing Online Egocentric Action Recognition into the wild

Figure 2 for Bringing Online Egocentric Action Recognition into the wild

Figure 3 for Bringing Online Egocentric Action Recognition into the wild

Figure 4 for Bringing Online Egocentric Action Recognition into the wild

Abstract:To enable a safe and effective human-robot cooperation, it is crucial to develop models for the identification of human activities. Egocentric vision seems to be a viable solution to solve this problem, and therefore many works provide deep learning solutions to infer human actions from first person videos. However, although very promising, most of these do not consider the major challenges that comes with a realistic deployment, such as the portability of the model, the need for real-time inference, and the robustness with respect to the novel domains (i.e., new spaces, users, tasks). With this paper, we set the boundaries that egocentric vision models should consider for realistic applications, defining a novel setting of egocentric action recognition in the wild, which encourages researchers to develop novel, applications-aware solutions. We also present a new model-agnostic technique that enables the rapid repurposing of existing architectures in this new context, demonstrating the feasibility to deploy a model on a tiny device (Jetson Nano) and to perform the task directly on the edge with very low energy consumption (2.4W on average at 50 fps).

Via

Access Paper or Ask Questions

PoliTO-IIT-CINI Submission to the EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition

Sep 09, 2022

Mirco Planamente, Gabriele Goletto, Gabriele Trivigno, Giuseppe Averta, Barbara Caputo

Figure 1 for PoliTO-IIT-CINI Submission to the EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition

Figure 2 for PoliTO-IIT-CINI Submission to the EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition

Figure 3 for PoliTO-IIT-CINI Submission to the EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition

Abstract:In this report, we describe the technical details of our submission to the EPIC-Kitchens-100 Unsupervised Domain Adaptation (UDA) Challenge in Action Recognition. To tackle the domain-shift which exists under the UDA setting, we first exploited a recent Domain Generalization (DG) technique, called Relative Norm Alignment (RNA). Secondly, we extended this approach to work on unlabelled target data, enabling a simpler adaptation of the model to the target distribution in an unsupervised fashion. To this purpose, we included in our framework UDA algorithms, such as multi-level adversarial alignment and attentive entropy. By analyzing the challenge setting, we notice the presence of a secondary concurrence shift in the data, which is usually called environmental bias. It is caused by the existence of different environments, i.e., kitchens. To deal with these two shifts (environmental and temporal), we extended our system to perform Multi-Source Multi-Target Domain Adaptation. Finally, we employed distinct models in our final proposal to leverage the potential of popular video architectures, and we introduced two more losses for the ensemble adaptation. Our submission (entry 'plnet') is visible on the leaderboard and ranked in 2nd position for 'verb', and in 3rd position for both 'noun' and 'action'.

* 3rd place in the 2022 EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition. arXiv admin note: substantial text overlap with arXiv:2107.00337

Via

Access Paper or Ask Questions

E$^2$MOTION: Motion Augmented Event Stream for Egocentric Action Recognition

Dec 07, 2021

Chiara Plizzari, Mirco Planamente, Gabriele Goletto, Marco Cannici, Emanuele Gusso, Matteo Matteucci, Barbara Caputo

Figure 1 for E$^2$MOTION: Motion Augmented Event Stream for Egocentric Action Recognition

Figure 2 for E$^2$MOTION: Motion Augmented Event Stream for Egocentric Action Recognition

Figure 3 for E$^2$MOTION: Motion Augmented Event Stream for Egocentric Action Recognition

Figure 4 for E$^2$MOTION: Motion Augmented Event Stream for Egocentric Action Recognition

Abstract:Event cameras are novel bio-inspired sensors, which asynchronously capture pixel-level intensity changes in the form of "events". Due to their sensing mechanism, event cameras have little to no motion blur, a very high temporal resolution and require significantly less power and memory than traditional frame-based cameras. These characteristics make them a perfect fit to several real-world applications such as egocentric action recognition on wearable devices, where fast camera motion and limited power challenge traditional vision sensors. However, the ever-growing field of event-based vision has, to date, overlooked the potential of event cameras in such applications. In this paper, we show that event data is a very valuable modality for egocentric action recognition. To do so, we introduce N-EPIC-Kitchens, the first event-based camera extension of the large-scale EPIC-Kitchens dataset. In this context, we propose two strategies: (i) directly processing event-camera data with traditional video-processing architectures (E$^2$(GO)) and (ii) using event-data to distill optical flow information (E$^2$(GO)MO). On our proposed benchmark, we show that event data provides a comparable performance to RGB and optical flow, yet without any additional flow computation at deploy time, and an improved performance of up to 4% with respect to RGB only information.

Via

Access Paper or Ask Questions