Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arij Bouazizi

Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving

Jul 27, 2023

Peter Bauer, Arij Bouazizi, Ulrich Kressel, Fabian B. Flohr

Abstract:Accurate 3D human pose estimation (3D HPE) is crucial for enabling autonomous vehicles (AVs) to make informed decisions and respond proactively in critical road scenarios. Promising results of 3D HPE have been gained in several domains such as human-computer interaction, robotics, sports and medical analytics, often based on data collected in well-controlled laboratory environments. Nevertheless, the transfer of 3D HPE methods to AVs has received limited research attention, due to the challenges posed by obtaining accurate 3D pose annotations and the limited suitability of data from other domains. We present a simple yet efficient weakly supervised approach for 3D HPE in the AV context by employing a high-level sensor fusion between camera and LiDAR data. The weakly supervised setting enables training on the target datasets without any 2D/3D keypoint labels by using an off-the-shelf 2D joint extractor and pseudo labels generated from LiDAR to image projections. Our approach outperforms state-of-the-art results by up to $\sim$ 13% on the Waymo Open Dataset in the weakly supervised setting and achieves state-of-the-art results in the supervised setting.

* 7 pages, Accepted at IEEE-IV 2023

Via

Access Paper or Ask Questions

Knowing What to Label for Few Shot Microscopy Image Cell Segmentation

Nov 18, 2022

Youssef Dawoud, Arij Bouazizi, Katharina Ernst, Gustavo Carneiro, Vasileios Belagiannis

Abstract:In microscopy image cell segmentation, it is common to train a deep neural network on source data, containing different types of microscopy images, and then fine-tune it using a support set comprising a few randomly selected and annotated training target images. In this paper, we argue that the random selection of unlabelled training target images to be annotated and included in the support set may not enable an effective fine-tuning process, so we propose a new approach to optimise this image selection process. Our approach involves a new scoring function to find informative unlabelled target images. In particular, we propose to measure the consistency in the model predictions on target images against specific data augmentations. However, we observe that the model trained with source datasets does not reliably evaluate consistency on target images. To alleviate this problem, we propose novel self-supervised pretext tasks to compute the scores of unlabelled target images. Finally, the top few images with the least consistency scores are added to the support set for oracle (i.e., expert) annotation and later used to fine-tune the model to the target images. In our evaluations that involve the segmentation of five different types of cell images, we demonstrate promising results on several target test sets compared to the random selection approach as well as other selection approaches, such as Shannon's entropy and Monte-Carlo dropout.

* Accepted to WACV 2023

Via

Access Paper or Ask Questions

MotionMixer: MLP-based 3D Human Body Pose Forecasting

Jul 01, 2022

Arij Bouazizi, Adrian Holzbock, Ulrich Kressel, Klaus Dietmayer, Vasileios Belagiannis

Figure 1 for MotionMixer: MLP-based 3D Human Body Pose Forecasting

Figure 2 for MotionMixer: MLP-based 3D Human Body Pose Forecasting

Figure 3 for MotionMixer: MLP-based 3D Human Body Pose Forecasting

Figure 4 for MotionMixer: MLP-based 3D Human Body Pose Forecasting

Abstract:In this work, we present MotionMixer, an efficient 3D human body pose forecasting model based solely on multi-layer perceptrons (MLPs). MotionMixer learns the spatial-temporal 3D body pose dependencies by sequentially mixing both modalities. Given a stacked sequence of 3D body poses, a spatial-MLP extracts fine grained spatial dependencies of the body joints. The interaction of the body joints over time is then modelled by a temporal MLP. The spatial-temporal mixed features are finally aggregated and decoded to obtain the future motion. To calibrate the influence of each time step in the pose sequence, we make use of squeeze-and-excitation (SE) blocks. We evaluate our approach on Human3.6M, AMASS, and 3DPW datasets using the standard evaluation protocols. For all evaluations, we demonstrate state-of-the-art performance, while having a model with a smaller number of parameters. Our code is available at: https://github.com/MotionMLP/MotionMixer

* Accepted by IJCAI-ECAI'22 (Oral-Long presentation)

Via

Access Paper or Ask Questions

Anomaly Detection in Multi-Agent Trajectories for Automated Driving

Oct 28, 2021

Julian Wiederer, Arij Bouazizi, Marco Troina, Ulrich Kressel, Vasileios Belagiannis

Figure 1 for Anomaly Detection in Multi-Agent Trajectories for Automated Driving

Figure 2 for Anomaly Detection in Multi-Agent Trajectories for Automated Driving

Figure 3 for Anomaly Detection in Multi-Agent Trajectories for Automated Driving

Figure 4 for Anomaly Detection in Multi-Agent Trajectories for Automated Driving

Abstract:Human drivers can recognise fast abnormal driving situations to avoid accidents. Similar to humans, automated vehicles are supposed to perform anomaly detection. In this work, we propose the spatio-temporal graph auto-encoder for learning normal driving behaviours. Our innovation is the ability to jointly learn multiple trajectories of a dynamic number of agents. To perform anomaly detection, we first estimate a density function of the learned trajectory feature representation and then detect anomalies in low-density regions. Due to the lack of multi-agent trajectory datasets for anomaly detection in automated driving, we introduce our dataset using a driving simulator for normal and abnormal manoeuvres. Our evaluations show that our approach learns the relation between different agents and delivers promising results compared to the related works. The code, simulation and the dataset are publicly available on https://github.com/againerju/maad_highway.

* 15 pages incl. supplementary material, 8 figures, 4 tables (accepted by CoRL 2021)

Via

Access Paper or Ask Questions

Learning Temporal 3D Human Pose Estimation with Pseudo-Labels

Oct 14, 2021

Arij Bouazizi, Ulrich Kressel, Vasileios Belagiannis

Figure 1 for Learning Temporal 3D Human Pose Estimation with Pseudo-Labels

Figure 2 for Learning Temporal 3D Human Pose Estimation with Pseudo-Labels

Figure 3 for Learning Temporal 3D Human Pose Estimation with Pseudo-Labels

Figure 4 for Learning Temporal 3D Human Pose Estimation with Pseudo-Labels

Abstract:We present a simple, yet effective, approach for self-supervised 3D human pose estimation. Unlike the prior work, we explore the temporal information next to the multi-view self-supervision. During training, we rely on triangulating 2D body pose estimates of a multiple-view camera system. A temporal convolutional neural network is trained with the generated 3D ground-truth and the geometric multi-view consistency loss, imposing geometrical constraints on the predicted 3D body skeleton. During inference, our model receives a sequence of 2D body pose estimates from a single-view to predict the 3D body pose for each of them. An extensive evaluation shows that our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks. Our code and models are publicly available at \url{https://github.com/vru2020/TM_HPE/}.

* Accepted for publication at AVSS 2021. Project page:https://github.com/vru2020/TM_HPE/

Via

Access Paper or Ask Questions

Self-Supervised 3D Human Pose Estimation with Multiple-View Geometry

Aug 17, 2021

Arij Bouazizi, Julian Wiederer, Ulrich Kressel, Vasileios Belagiannis

Figure 1 for Self-Supervised 3D Human Pose Estimation with Multiple-View Geometry

Figure 2 for Self-Supervised 3D Human Pose Estimation with Multiple-View Geometry

Figure 3 for Self-Supervised 3D Human Pose Estimation with Multiple-View Geometry

Figure 4 for Self-Supervised 3D Human Pose Estimation with Multiple-View Geometry

Abstract:We present a self-supervised learning algorithm for 3D human pose estimation of a single person based on a multiple-view camera system and 2D body pose estimates for each view. To train our model, represented by a deep neural network, we propose a four-loss function learning algorithm, which does not require any 2D or 3D body pose ground-truth. The proposed loss functions make use of the multiple-view geometry to reconstruct 3D body pose estimates and impose body pose constraints across the camera views. Our approach utilizes all available camera views during training, while the inference is single-view. In our evaluations, we show promising performance on Human3.6M and HumanEva benchmarks, while we also present a generalization study on MPI-INF-3DHP dataset, as well as several ablation results. Overall, we outperform all self-supervised learning methods and reach comparable results to supervised and weakly-supervised learning approaches. Our code and models are publicly available

* Accepted for publication at FG 2021

Via

Access Paper or Ask Questions

Traffic Control Gesture Recognition for Autonomous Vehicles

Jul 31, 2020

Julian Wiederer, Arij Bouazizi, Ulrich Kressel, Vasileios Belagiannis

Figure 1 for Traffic Control Gesture Recognition for Autonomous Vehicles

Figure 2 for Traffic Control Gesture Recognition for Autonomous Vehicles

Figure 3 for Traffic Control Gesture Recognition for Autonomous Vehicles

Figure 4 for Traffic Control Gesture Recognition for Autonomous Vehicles

Abstract:A car driver knows how to react on the gestures of the traffic officers. Clearly, this is not the case for the autonomous vehicle, unless it has road traffic control gesture recognition functionalities. In this work, we address the limitation of the existing autonomous driving datasets to provide learning data for traffic control gesture recognition. We introduce a dataset that is based on 3D body skeleton input to perform traffic control gesture classification on every time step. Our dataset consists of 250 sequences from several actors, ranging from 16 to 90 seconds per sequence. To evaluate our dataset, we propose eight sequential processing models based on deep neural networks such as recurrent networks, attention mechanism, temporal convolutional networks and graph convolutional networks. We present an extensive evaluation and analysis of all approaches for our dataset, as well as real-world quantitative evaluation. The code and dataset is publicly available.

* 8 pages, 8 figures, 3 tables, accepted by IROS 2020

Via

Access Paper or Ask Questions