IFREMER
Abstract:The joint use of event-based vision and Spiking Neural Networks (SNNs) is expected to have a large impact in robotics in the near future, in tasks such as, visual odometry and obstacle avoidance. While researchers have used real-world event datasets for optical flow prediction (mostly captured with Unmanned Aerial Vehicles (UAVs)), these datasets are limited in diversity, scalability, and are challenging to collect. Thus, synthetic datasets offer a scalable alternative by bridging the gap between reality and simulation. In this work, we address the lack of datasets by introducing eWiz, a comprehensive library for processing event-based data. It includes tools for data loading, augmentation, visualization, encoding, and generation of training data, along with loss functions and performance metrics. We further present a synthetic event-based datasets and data generation pipelines for optical flow prediction tasks. Built on top of eWiz, eCARLA-scenes makes use of the CARLA simulator to simulate self-driving car scenarios. The ultimate goal of this dataset is the depiction of diverse environments while laying a foundation for advancing event-based camera applications in autonomous field vehicle navigation, paving the way for using SNNs on neuromorphic hardware such as the Intel Loihi.
Abstract:Distinguishing among different marine benthic habitat characteristics is of key importance in a wide set of seabed operations ranging from installations of oil rigs to laying networks of cables and monitoring the impact of humans on marine ecosystems. The Side-Scan Sonar (SSS) is a widely used imaging sensor in this regard. It produces high-resolution seafloor maps by logging the intensities of sound waves reflected back from the seafloor. In this work, we leverage these acoustic intensity maps to produce pixel-wise categorization of different seafloor types. We propose a novel architecture adapted from the Vision Transformer (ViT) in an encoder-decoder framework. Further, in doing so, the applicability of ViTs is evaluated on smaller datasets. To overcome the lack of CNN-like inductive biases, thereby making ViTs more conducive to applications in low data regimes, we propose a novel feature extraction module to replace the Multi-layer Perceptron (MLP) block within transformer layers and a novel module to extract multiscale patch embeddings. A lightweight decoder is also proposed to complement this design in order to further boost multiscale feature extraction. With the modified architecture, we achieve state-of-the-art results and also meet real-time computational requirements. We make our code available at ~\url{https://github.com/hayatrajani/s3seg-vit
Abstract:Hyperspectral imaging has been increasingly used for underwater survey applications over the past years. As many hyperspectral cameras work as push-broom scanners, their use is usually limited to the creation of photo-mosaics based on a flat surface approximation and by interpolating the camera pose from dead-reckoning navigation. Yet, because of drift in the navigation and the mostly wrong flat surface assumption, the quality of the obtained photo-mosaics is often too low to support adequate analysis.In this paper we present an initial method for creating hyperspectral 3D reconstructions of underwater environments. By fusing the data gathered by a classical RGB camera, an inertial navigation system and a hyperspectral push-broom camera, we show that the proposed method creates highly accurate 3D reconstructions with hyperspectral textures. We propose to combine techniques from simultaneous localization and mapping, structure-from-motion and 3D reconstruction and advantageously use them to create 3D models with hyperspectral texture, allowing us to overcome the flat surface assumption and the classical limitation of dead-reckoning navigation.
Abstract:Recent advances in structure-from-motion techniques are enabling many scientific fields to benefit from the routine creation of detailed 3D models. However, for a large number of applications, only a single camera is available, due to cost or space constraints in the survey platforms. Monocular structure-from-motion raises the issue of properly estimating the scale of the 3D models, in order to later use those models for metrology. The scale can be determined from the presence of visible objects of known dimensions, or from information on the magnitude of the camera motion provided by other sensors, such as GPS. This paper addresses the problem of accurately scaling 3D models created from monocular cameras in GPS-denied environments, such as in underwater applications. Motivated by the common availability of underwater laser scalers, we present two novel approaches. A fully-calibrated method enables the use of arbitrary laser setups, while a partially-calibrated method reduces the need for calibration by only assuming parallelism on the laser beams, with no constraints on the camera. The proposed methods have several advantages with respect to the existing methods. The need for laser alignment with the optical axis of the camera is removed, together with the extremely error-prone manual identification of image points on the 3D model. The performance of the methods and their applicability was evaluated on both data generated from a realistic 3D model and data collected during an oceanographic cruise in 2017. Three separate laser configurations have been tested, encompassing nearly all possible laser setups, to evaluate the effects of terrain roughness, noise, camera perspective angle and camera-scene distance. In the real scenario, the computation of 6 independent model scale estimates using our fully-calibrated approach, produced values with standard deviation of 0.3%.