Abstract:We propose the first metric learning system for the recognition of great ape behavioural actions. Our proposed triple stream embedding architecture works on camera trap videos taken directly in the wild and demonstrates that the utilisation of an explicit DensePose-C chimpanzee body part segmentation stream effectively complements traditional RGB appearance and optical flow streams. We evaluate system variants with different feature fusion techniques and long-tail recognition approaches. Results and ablations show performance improvements of ~12% in top-1 accuracy over previous results achieved on the PanAf-500 dataset containing 180,000 manually annotated frames across nine behavioural actions. Furthermore, we provide a qualitative analysis of our findings and augment the metric learning system with long-tail recognition techniques showing that average per class accuracy -- critical in the domain -- can be improved by ~23% compared to the literature on that dataset. Finally, since our embedding spaces are constructed as metric, we provide first data-driven visualisations of the great ape behavioural action spaces revealing emerging geometry and topology. We hope that the work sparks further interest in this vital application area of computer vision for the benefit of endangered great apes.
Abstract:The development and application of modern technology is an essential basis for the efficient monitoring of species in natural habitats and landscapes to trace the development of ecosystems, species communities, and populations, and to analyze reasons of changes. For estimating animal abundance using methods such as camera trap distance sampling, spatial information of natural habitats in terms of 3D (three-dimensional) measurements is crucial. Additionally, 3D information improves the accuracy of animal detection using camera trapping. This study presents a novel approach to 3D camera trapping featuring highly optimized hardware and software. This approach employs stereo vision to infer 3D information of natural habitats and is designated as StereO CameRA Trap for monitoring of biodivErSity (SOCRATES). A comprehensive evaluation of SOCRATES shows not only a $3.23\%$ improvement in animal detection (bounding box $\text{mAP}_{75}$) but also its superior applicability for estimating animal abundance using camera trap distance sampling. The software and documentation of SOCRATES is provided at https://github.com/timmh/socrates