Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alessandro Zimmer

A Light Perspective for 3D Object Detection

Mar 10, 2025

Marcelo Eduardo Pederiva, José Mario De Martino, Alessandro Zimmer

Abstract:Comprehending the environment and accurately detecting objects in 3D space are essential for advancing autonomous vehicle technologies. Integrating Camera and LIDAR data has emerged as an effective approach for achieving high accuracy in 3D Object Detection models. However, existing methodologies often rely on heavy, traditional backbones that are computationally demanding. This paper introduces a novel approach that incorporates cutting-edge Deep Learning techniques into the feature extraction process, aiming to create more efficient models without compromising performance. Our model, NextBEV, surpasses established feature extractors like ResNet50 and MobileNetV2. On the KITTI 3D Monocular detection benchmark, NextBEV achieves an accuracy improvement of 2.39%, having less than 10% of the MobileNetV3 parameters. Moreover, we propose changes in LIDAR backbones that decreased the original inference time to 10 ms. Additionally, by fusing these lightweight proposals, we have enhanced the accuracy of the VoxelNet-based model by 2.93% and improved the F1-score of the PointPillar-based model by approximately 20%. Therefore, this work contributes to establishing lightweight and powerful models for individual or fusion techniques, making them more suitable for onboard implementations.

* Proc. SPIE 13517, Seventeenth International Conference on Machine Vision (ICMV 2024), 135170J (24 February 2025)

Via

Access Paper or Ask Questions

MonoNext: A 3D Monocular Object Detection with ConvNext

Aug 01, 2023

Marcelo Eduardo Pederiva, José Mario De Martino, Alessandro Zimmer

Figure 1 for MonoNext: A 3D Monocular Object Detection with ConvNext

Figure 2 for MonoNext: A 3D Monocular Object Detection with ConvNext

Figure 3 for MonoNext: A 3D Monocular Object Detection with ConvNext

Figure 4 for MonoNext: A 3D Monocular Object Detection with ConvNext

Abstract:Autonomous driving perception tasks rely heavily on cameras as the primary sensor for Object Detection, Semantic Segmentation, Instance Segmentation, and Object Tracking. However, RGB images captured by cameras lack depth information, which poses a significant challenge in 3D detection tasks. To supplement this missing data, mapping sensors such as LIDAR and RADAR are used for accurate 3D Object Detection. Despite their significant accuracy, the multi-sensor models are expensive and require a high computational demand. In contrast, Monocular 3D Object Detection models are becoming increasingly popular, offering a faster, cheaper, and easier-to-implement solution for 3D detections. This paper introduces a different Multi-Tasking Learning approach called MonoNext that utilizes a spatial grid to map objects in the scene. MonoNext employs a straightforward approach based on the ConvNext network and requires only 3D bounding box annotated data. In our experiments with the KITTI dataset, MonoNext achieved high precision and competitive performance comparable with state-of-the-art approaches. Furthermore, by adding more training data, MonoNext surpassed itself and achieved higher accuracies.

Via

Access Paper or Ask Questions

Texture CNN for Thermoelectric Metal Pipe Image Classification

May 28, 2019

Daniel Vriesman, Alessandro Zimmer, Alceu S. Britto Jr., Alessandro L. Koerich

Figure 1 for Texture CNN for Thermoelectric Metal Pipe Image Classification

Figure 2 for Texture CNN for Thermoelectric Metal Pipe Image Classification

Figure 3 for Texture CNN for Thermoelectric Metal Pipe Image Classification

Figure 4 for Texture CNN for Thermoelectric Metal Pipe Image Classification

Abstract:In this paper, the concept of representation learning based on deep neural networks is applied as an alternative to the use of handcrafted features in a method for automatic visual inspection of corroded thermoelectric metallic pipes. A texture convolutional neural network (TCNN) replaces handcrafted features based on Local Phase Quantization (LPQ) and Haralick descriptors (HD) with the advantage of learning an appropriate textural representation and the decision boundaries into a single optimization process. Experimental results have shown that it is possible to reach the accuracy of 99.20% in the task of identifying different levels of corrosion in the internal surface of thermoelectric pipe walls, while using a compact network that requires much less effort in tuning parameters when compared to the handcrafted approach since the TCNN architecture is compact regarding the number of layers and connections. The observed results open up the possibility of using deep neural networks in real-time applications such as the automatic inspection of thermoelectric metal pipes.

Via

Access Paper or Ask Questions