Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

François Bremond

INRIA Sophia Antipolis

From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation

Oct 19, 2021

Dhruv Agarwal, Tanay Agrawal, Laura M. Ferrari, François Bremond

Figure 1 for From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation

Figure 2 for From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation

Figure 3 for From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation

Figure 4 for From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation

Abstract:Multimodal Deep Learning has garnered much interest, and transformers have triggered novel approaches, thanks to the cross-attention mechanism. Here we propose an approach to deal with two key existing challenges: the high computational resource demanded and the issue of missing modalities. We introduce for the first time the concept of knowledge distillation in transformers to use only one modality at inference time. We report a full study analyzing multiple student-teacher configurations, levels at which distillation is applied, and different methodologies. With the best configuration, we improved the state-of-the-art accuracy by 3%, we reduced the number of parameters by 2.5 times and the inference time by 22%. Such performance-computation tradeoff can be exploited in many applications and we aim at opening a new research area where the deployment of complex models with limited resources is demanded.

* Preprint. Final paper accepted at the 17th IEEE International Conference on Advanced Video and Signal-based Surveillance, AVSS 2021, Virtual, November 16-19, 2021. 10 pages

Via

Access Paper or Ask Questions

FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

Oct 10, 2021

Neelabh Sinha, Michal Balazia, François Bremond

Figure 1 for FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

Figure 2 for FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

Figure 3 for FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

Figure 4 for FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

Abstract:3D gaze estimation is about predicting the line of sight of a person in 3D space. Person-independent models for the same lack precision due to anatomical differences of subjects, whereas person-specific calibrated techniques add strict constraints on scalability. To overcome these issues, we propose a novel technique, Facial Landmark Heatmap Activated Multimodal Gaze Estimation (FLAME), as a way of combining eye anatomical information using eye landmark heatmaps to obtain precise gaze estimation without any person-specific calibration. Our evaluation demonstrates a competitive performance of about 10% improvement on benchmark datasets ColumbiaGaze and EYEDIAP. We also conduct an ablation study to validate our method.

* Preprint. Final paper accepted at the 17th IEEE International Conference on Advanced Video and Signal-based Surveillance, AVSS 2021, Virtual, November 16-19, 2021. 8 pages

Via

Access Paper or Ask Questions

One-class Autoencoder Approach for Optimal Electrode Set-up Identification in Wearable EEG Event Monitoring

Apr 13, 2021

Laura M. Ferrari, Guy Abi Hanna, Paolo Volpe, Esma Ismailova, François Bremond, Maria A. Zuluaga

Figure 1 for One-class Autoencoder Approach for Optimal Electrode Set-up Identification in Wearable EEG Event Monitoring

Figure 2 for One-class Autoencoder Approach for Optimal Electrode Set-up Identification in Wearable EEG Event Monitoring

Figure 3 for One-class Autoencoder Approach for Optimal Electrode Set-up Identification in Wearable EEG Event Monitoring

Abstract:A limiting factor towards the wide routine use of wearables devices for continuous healthcare monitoring is their cumbersome and obtrusive nature. This is particularly true for electroencephalography (EEG) recordings, which require the placement of multiple electrodes in contact with the scalp. In this work, we propose to identify the optimal wearable EEG electrode set-up, in terms of minimal number of electrodes, comfortable location and performance, for EEG-based event detection and monitoring. By relying on the demonstrated power of autoencoder (AE) networks to learn latent representations from high-dimensional data, our proposed strategy trains an AE architecture in a one-class classification setup with different electrode set-ups as input data. The resulting models are assessed using the F-score and the best set-up is chosen according to the established optimal criteria. Using alpha wave detection as use case, we demonstrate that the proposed method allows to detect an alpha state from an optimal set-up consisting of electrodes in the forehead and behind the ear, with an average F-score of 0.78. Our results suggest that a learning-based approach can be used to enable the design and implementation of optimized wearable devices for real-life healthcare monitoring.

Via

Access Paper or Ask Questions

Automatic Tracker Selection w.r.t Object Detection Performance

Apr 08, 2014

Duc Phu Chau, François Bremond, Monique Thonnat, Slawomir Bak

Figure 1 for Automatic Tracker Selection w.r.t Object Detection Performance

Figure 2 for Automatic Tracker Selection w.r.t Object Detection Performance

Figure 3 for Automatic Tracker Selection w.r.t Object Detection Performance

Figure 4 for Automatic Tracker Selection w.r.t Object Detection Performance

Abstract:The tracking algorithm performance depends on video content. This paper presents a new multi-object tracking approach which is able to cope with video content variations. First the object detection is improved using Kanade- Lucas-Tomasi (KLT) feature tracking. Second, for each mobile object, an appropriate tracker is selected among a KLT-based tracker and a discriminative appearance-based tracker. This selection is supported by an online tracking evaluation. The approach has been experimented on three public video datasets. The experimental results show a better performance of the proposed approach compared to recent state of the art trackers.

* IEEE Winter Conference on Applications of Computer Vision (WACV 2014) (2014)

Via

Access Paper or Ask Questions

Online Tracking Parameter Adaptation based on Evaluation

Jul 22, 2013

Duc Phu Chau, Julien Badie, François Bremond, Monique Thonnat

Figure 1 for Online Tracking Parameter Adaptation based on Evaluation

Figure 2 for Online Tracking Parameter Adaptation based on Evaluation

Figure 3 for Online Tracking Parameter Adaptation based on Evaluation

Figure 4 for Online Tracking Parameter Adaptation based on Evaluation

Abstract:Parameter tuning is a common issue for many tracking algorithms. In order to solve this problem, this paper proposes an online parameter tuning to adapt a tracking algorithm to various scene contexts. In an offline training phase, this approach learns how to tune the tracker parameters to cope with different contexts. In the online control phase, once the tracking quality is evaluated as not good enough, the proposed approach computes the current context and tunes the tracking parameters using the learned values. The experimental results show that the proposed approach improves the performance of the tracking algorithm and outperforms recent state of the art trackers. This paper brings two contributions: (1) an online tracking evaluation, and (2) a method to adapt online tracking parameters to scene contexts.

* IEEE International Conference on Advanced Video and Signal-based Surveillance (2013)

Via

Access Paper or Ask Questions

Automatic Parameter Adaptation for Multi-object Tracking

May 13, 2013

Duc Phu Chau, Monique Thonnat, François Bremond

Figure 1 for Automatic Parameter Adaptation for Multi-object Tracking

Figure 2 for Automatic Parameter Adaptation for Multi-object Tracking

Figure 3 for Automatic Parameter Adaptation for Multi-object Tracking

Figure 4 for Automatic Parameter Adaptation for Multi-object Tracking

Abstract:Object tracking quality usually depends on video context (e.g. object occlusion level, object density). In order to decrease this dependency, this paper presents a learning approach to adapt the tracker parameters to the context variations. In an offline phase, satisfactory tracking parameters are learned for video context clusters. In the online control phase, once a context change is detected, the tracking parameters are tuned using the learned values. The experimental results show that the proposed approach outperforms the recent trackers in state of the art. This paper brings two contributions: (1) a classification method of video sequences to learn offline tracking parameters, (2) a new method to tune online tracking parameters using tracking context.

* International Conference on Computer Vision Systems (ICVS) (2013)

Via

Access Paper or Ask Questions

Object Tracking in Videos: Approaches and Issues

Apr 18, 2013

Duc Phu Chau, François Bremond, Monique Thonnat

Figure 1 for Object Tracking in Videos: Approaches and Issues

Figure 2 for Object Tracking in Videos: Approaches and Issues

Figure 3 for Object Tracking in Videos: Approaches and Issues

Figure 4 for Object Tracking in Videos: Approaches and Issues

Abstract:Mobile object tracking has an important role in the computer vision applications. In this paper, we use a tracked target-based taxonomy to present the object tracking algorithms. The tracked targets are divided into three categories: points of interest, appearance and silhouette of mobile objects. Advantages and limitations of the tracking approaches are also analyzed to find the future directions in the object tracking domain.

* The International Workshop "Rencontres UNS-UD" (RUNSUD) (2013)

Via

Access Paper or Ask Questions

A generic framework for video understanding applied to group behavior recognition

Jun 22, 2012

Sofia Zaidenberg, Bernard Boulay, François Bremond

Figure 1 for A generic framework for video understanding applied to group behavior recognition

Figure 2 for A generic framework for video understanding applied to group behavior recognition

Figure 3 for A generic framework for video understanding applied to group behavior recognition

Figure 4 for A generic framework for video understanding applied to group behavior recognition

Abstract:This paper presents an approach to detect and track groups of people in video-surveillance applications, and to automatically recognize their behavior. This method keeps track of individuals moving together by maintaining a spacial and temporal group coherence. First, people are individually detected and tracked. Second, their trajectories are analyzed over a temporal window and clustered using the Mean-Shift algorithm. A coherence value describes how well a set of people can be described as a group. Furthermore, we propose a formal event description language. The group events recognition approach is successfully validated on 4 camera views from 3 datasets: an airport, a subway, a shopping center corridor and an entrance hall.

* 9th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS 2012) (2012) 136 -142
* (20/03/2012)

Via

Access Paper or Ask Questions

A multi-feature tracking algorithm enabling adaptation to context variations

Dec 06, 2011

Duc Phu Chau, François Bremond, Monique Thonnat

Figure 1 for A multi-feature tracking algorithm enabling adaptation to context variations

Figure 2 for A multi-feature tracking algorithm enabling adaptation to context variations

Figure 3 for A multi-feature tracking algorithm enabling adaptation to context variations

Figure 4 for A multi-feature tracking algorithm enabling adaptation to context variations

Abstract:We propose in this paper a tracking algorithm which is able to adapt itself to different scene contexts. A feature pool is used to compute the matching score between two detected objects. This feature pool includes 2D, 3D displacement distances, 2D sizes, color histogram, histogram of oriented gradient (HOG), color covariance and dominant color. An offline learning process is proposed to search for useful features and to estimate their weights for each context. In the online tracking process, a temporal window is defined to establish the links between the detected objects. This enables to find the object trajectories even if the objects are misdetected in some frames. A trajectory filter is proposed to remove noisy trajectories. Experimentation on different contexts is shown. The proposed tracker has been tested in videos belonging to three public datasets and to the Caretaker European project. The experimental results prove the effect of the proposed feature weight learning, and the robustness of the proposed tracker compared to some methods in the state of the art. The contributions of our approach over the state of the art trackers are: (i) a robust tracking algorithm based on a feature pool, (ii) a supervised learning scheme to learn feature weights for each context, (iii) a new method to quantify the reliability of HOG descriptor, (iv) a combination of color covariance and dominant color features with spatial pyramid distance to manage the case of object occlusion.

* The International Conference on Imaging for Crime Detection and Prevention (ICDP) (2011)

Via

Access Paper or Ask Questions

Robust Mobile Object Tracking Based on Multiple Feature Similarity and Trajectory Filtering

Jun 14, 2011

Duc Phu Chau, François Bremond, Monique Thonnat, Etienne Corvee

Figure 1 for Robust Mobile Object Tracking Based on Multiple Feature Similarity and Trajectory Filtering

Figure 2 for Robust Mobile Object Tracking Based on Multiple Feature Similarity and Trajectory Filtering

Figure 3 for Robust Mobile Object Tracking Based on Multiple Feature Similarity and Trajectory Filtering

Abstract:This paper presents a new algorithm to track mobile objects in different scene conditions. The main idea of the proposed tracker includes estimation, multi-features similarity measures and trajectory filtering. A feature set (distance, area, shape ratio, color histogram) is defined for each tracked object to search for the best matching object. Its best matching object and its state estimated by the Kalman filter are combined to update position and size of the tracked object. However, the mobile object trajectories are usually fragmented because of occlusions and misdetections. Therefore, we also propose a trajectory filtering, named global tracker, aims at removing the noisy trajectories and fusing the fragmented trajectories belonging to a same mobile object. The method has been tested with five videos of different scene conditions. Three of them are provided by the ETISEO benchmarking project (http://www-sop.inria.fr/orion/ETISEO) in which the proposed tracker performance has been compared with other seven tracking algorithms. The advantages of our approach over the existing state of the art ones are: (i) no prior knowledge information is required (e.g. no calibration and no contextual models are needed), (ii) the tracker is more reliable by combining multiple feature similarities, (iii) the tracker can perform in different scene conditions: single/several mobile objects, weak/strong illumination, indoor/outdoor scenes, (iv) a trajectory filtering is defined and applied to improve the tracker performance, (v) the tracker performance outperforms many algorithms of the state of the art.

* The International Conference on Computer Vision Theory and Applications (VISAPP) (2011)

Via

Access Paper or Ask Questions