Abstract:Developing robust drone detection systems is often constrained by the limited availability of large-scale annotated training data and the high costs associated with real-world data collection. However, leveraging synthetic data generated via game engine-based simulations provides a promising and cost-effective solution to overcome this issue. Therefore, we present SynDroneVision, a synthetic dataset specifically designed for RGB-based drone detection in surveillance applications. Featuring diverse backgrounds, lighting conditions, and drone models, SynDroneVision offers a comprehensive training foundation for deep learning algorithms. To evaluate the dataset's effectiveness, we perform a comparative analysis across a selection of recent YOLO detection models. Our findings demonstrate that SynDroneVision is a valuable resource for real-world data enrichment, achieving notable enhancements in model performance and robustness, while significantly reducing the time and costs of real-world data acquisition. SynDroneVision will be publicly released upon paper acceptance.
Abstract:Floods are an increasingly common global threat, causing emergencies and severe damage to infrastructure. During crises, organisations such as the World Food Programme use remotely sensed imagery, typically obtained through drones, for rapid situational analysis to plan life-saving actions. Computer Vision tools are needed to support task force experts on-site in the evaluation of the imagery to improve their efficiency and to allocate resources strategically. We introduce the BlessemFlood21 dataset to stimulate research on efficient flood detection tools. The imagery was acquired during the 2021 Erftstadt-Blessem flooding event and consists of high-resolution and georeferenced RGB-NIR images. In the resulting RGB dataset, the images are supplemented with detailed water masks, obtained via a semi-supervised human-in-the-loop technique, where in particular the NIR information is leveraged to classify pixels as either water or non-water. We evaluate our dataset by training and testing established Deep Learning models for semantic segmentation. With BlessemFlood21 we provide labeled high-resolution RGB data and a baseline for further development of algorithmic solutions tailored to flood detection in RGB imagery.
Abstract:The bilateral asymmetry of flanks of animals with visual body marks that uniquely identify an individual, complicates tasks like population estimations. Automatically generated additional information on the visible side of the animal would improve the accuracy for individual identification. In this study we used transfer learning on popular CNN image classification architectures to train a flank predictor that predicts the visible flank of quadruped mammalian species in images. We automatically derived the data labels from existing datasets originally labeled for animal pose estimation. We trained the models in two phases with different degrees of retraining. The developed models were evaluated in different scenarios of different unknown quadruped species in known and unknown environments. As a real-world scenario, we used a dataset of manually labeled Eurasian lynx (Lynx lynx) from camera traps in the Bavarian Forest National Park to evaluate the model. The best model, trained on an EfficientNetV2 backbone, achieved an accuracy of 88.70 % for the unknown species lynx in a complex habitat.
Abstract:Analyses for biodiversity monitoring based on passive acoustic monitoring (PAM) recordings is time-consuming and challenged by the presence of background noise in recordings. Existing models for sound event detection (SED) worked only on certain avian species and the development of further models required labeled data. The developed framework automatically extracted labeled data from available platforms for selected avian species. The labeled data were embedded into recordings, including environmental sounds and noise, and were used to train convolutional recurrent neural network (CRNN) models. The models were evaluated on unprocessed real world data recorded in urban KwaZulu-Natal habitats. The Adapted SED-CRNN model reached a F1 score of 0.73, demonstrating its efficiency under noisy, real-world conditions. The proposed approach to automatically extract labeled data for chosen avian species enables an easy adaption of PAM to other species and habitats for future conservation projects.
Abstract:Predominant methods for image-based drone detection frequently rely on employing generic object detection algorithms like YOLOv5. While proficient in identifying drones against homogeneous backgrounds, these algorithms often struggle in complex, highly textured environments. In such scenarios, drones seamlessly integrate into the background, creating camouflage effects that adversely affect the detection quality. To address this issue, we introduce a novel deep learning architecture called YOLO-FEDER FusionNet. Unlike conventional approaches, YOLO-FEDER FusionNet combines generic object detection methods with the specialized strength of camouflage object detection techniques to enhance drone detection capabilities. Comprehensive evaluations of YOLO-FEDER FusionNet show the efficiency of the proposed model and demonstrate substantial improvements in both reducing missed detections and false alarms.
Abstract:The automation of games using Deep Reinforcement Learning Strategies (DRL) is a well-known challenge in AI research. While for feature extraction in a video game typically the whole image is used, this is hardly practical for many real world games. Instead, using a smaller game state reducing the dimension of the parameter space to include essential parameters only seems to be a promising approach. In the game of Foosball, a compact and comprehensive game state description consists of the positional shifts and rotations of the figures and the position of the ball over time. In particular, velocities and accelerations can be derived from consecutive time samples of the game state. In this paper, a figure detection system to determine the game state in Foosball is presented. We capture a dataset containing the rotations of the rods which were measured using accelerometers and the positional shifts were derived using traditional Computer Vision techniques (in a laboratory setting). This dataset is utilized to train Convolutional Neural Network (CNN) based end-to-end regression models to predict the rotations and shifts of each rod. We present an evaluation of our system using different state-of-the-art CNNs as base architectures for the regression model. We show that our system is able to predict the game state with high accuracy. By providing data for both black and white teams, the presented system is intended to provide the required data for future developments of Imitation Learning techniques w.r.t. to observing human players.
Abstract:Computed Tomography (CT) is a widely used medical imaging modality, and as it is based on ionizing radiation, it is desirable to minimize the radiation dose. However, a reduced radiation dose comes with reduced image quality, and reconstruction from low-dose CT (LDCT) data is still a challenging task which is subject to research. According to the LoDoPaB-CT benchmark, a benchmark for LDCT reconstruction, many state-of-the-art methods use pipelines involving UNet-type architectures. Specifically the top ranking method, ItNet, employs a three-stage process involving filtered backprojection (FBP), a UNet trained on CT data, and an iterative refinement step. In this paper, we propose a less complex two-stage method. The first stage also employs FBP, while the novelty lies in the training strategy for the second stage, characterized as the CT image enhancement stage. The crucial point of our approach is that the neural network is pretrained on a distinctly different pretraining task with non-CT data, namely Gaussian noise removal on a variety of natural grayscale images (photographs). We then fine-tune this network for the downstream task of CT image enhancement using pairs of LDCT images and corresponding normal-dose CT images (NDCT). Despite being notably simpler than the state-of-the-art, as the pretraining did not depend on domain-specific CT data and no further iterative refinement step was necessary, the proposed two-stage method achieves competitive results. The proposed method achieves a shared top ranking in the LoDoPaB-CT challenge and a first position with respect to the SSIM metric.
Abstract:Magnetic particle imaging (MPI) is an emerging medical imaging modality which has gained increasing interest in recent years. Among the benefits of MPI are its high temporal resolution, and that the technique does not expose the specimen to any kind of ionizing radiation. It is based on the non-linear response of magnetic nanoparticles to an applied magnetic field. From the electric signal measured in receive coils, the particle concentration has to be reconstructed. Due to the ill-posedness of the reconstruction problem, various regularization methods have been proposed for reconstruction ranging from early stopping methods, via classical Tikhonov regularization and iterative methods to modern machine learning approaches. In this work, we contribute to the latter class: we propose a plug-and-play approach based on a generic zero-shot denoiser with an $\ell^1$-prior. Moreover, we develop parameter selection strategies. Finally, we quantitatively and qualitatively evaluate the proposed algorithmic scheme on the 3D Open MPI data set with different levels of preprocessing.
Abstract:The manual processing and analysis of videos from camera traps is time-consuming and includes several steps, ranging from the filtering of falsely triggered footage to identifying and re-identifying individuals. In this study, we developed a pipeline to automatically analyze videos from camera traps to identify individuals without requiring manual interaction. This pipeline applies to animal species with uniquely identifiable fur patterns and solitary behavior, such as leopards (Panthera pardus). We assumed that the same individual was seen throughout one triggered video sequence. With this assumption, multiple images could be assigned to an individual for the initial database filling without pre-labeling. The pipeline was based on well-established components from computer vision and deep learning, particularly convolutional neural networks (CNNs) and scale-invariant feature transform (SIFT) features. We augmented this basis by implementing additional components to substitute otherwise required human interactions. Based on the similarity between frames from the video material, clusters were formed that represented individuals bypassing the open set problem of the unknown total population. The pipeline was tested on a dataset of leopard videos collected by the Pan African Programme: The Cultured Chimpanzee (PanAf) and achieved a success rate of over 83% for correct matches between previously unknown individuals. The proposed pipeline can become a valuable tool for future conservation projects based on camera trap data, reducing the work of manual analysis for individual identification, when labeled data is unavailable.
Abstract:We consider reconstructing multi-channel images from measurements performed by photon-counting and energy-discriminating detectors in the setting of multi-spectral X-ray computed tomography (CT). Our aim is to exploit the strong structural correlation that is known to exist between the channels of multi-spectral CT images. To that end, we adopt the multi-channel Potts prior to jointly reconstruct all channels. This prior produces piecewise constant solutions with strongly correlated channels. In particular, edges are enforced to have the same spatial position across channels which is a benefit over TV-based methods. We consider the Potts prior in two frameworks: (a) in the context of a variational Potts model, and (b) in a Potts-superiorization approach that perturbs the iterates of a basic iterative least squares solver. We identify an alternating direction method of multipliers (ADMM) approach as well as a Potts-superiorized conjugate gradient method as particularly suitable. In numerical experiments, we compare the Potts prior based approaches to existing TV-type approaches on realistically simulated multi-spectral CT data and obtain improved reconstruction for compound solid bodies.