Abstract:The VVC codec is applied to the task of multispectral image (MSI) compression using adaptive and scalable coding structures. In a 'plain' VVC approach, concepts from picture-to-picture temporal prediction are employed for decorrelation along the MSI's spectral dimension. The popular principle component analysis (PCA) for spectral decorrelation is further evaluated in combination with VVC intra-coding for spatial decorrelation. This approach is referred to as PCA-VVC. A novel adaptive MSI compression algorithm, named HPCLS, is introduced, that uses PCA and inter-prediction for spectral and VVC intra-coding for spatial decorrelation. Further, a novel adaptive scalable approach is proposed, that provides a separately decodable spectrally scaled preview of the MSI in the compressed file. Information contained in the preview is exploited in order to reduce the overall file size. All schemes are evaluated on images from the ARAD HS data set containing outdoor scenes with a high variety in brightness and color. We found that 'Plain' VVC is outperformed by both PCA-VVC and HPCLS. HPCLS shows advantageous rate-distortion (RD) behavior compared to PCA-VVC for reconstruction quality above 51dB PSNR. The performance of the scalable approach is compared to the combination of an independent RGB preview and one of HPCLS or PCA-VVC. The scalable approach shows significant benefit especially at higher preview qualities.
Abstract:LiDAR odometry (LO) describes the task of finding an alignment of subsequent LiDAR point clouds. This alignment can be used to estimate the motion of the platform where the LiDAR sensor is mounted on. Currently, on the well-known KITTI Vision Benchmark Suite state-of-the-art algorithms are non-learning approaches. We propose a network architecture that learns LO by directly processing 3D point clouds. It is trained on the KITTI dataset in an end-to-end manner without the necessity of pre-defining corresponding pairs of points. An evaluation on the KITTI Vision Benchmark Suite shows similar performance to a previously published work, DeepCLR [1], even though our model uses only around 3.56% of the number of network parameters thereof. Furthermore, a plane point extraction is applied which leads to a marginal performance decrease while simultaneously reducing the input size by up to 50%.