Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Karl Amende

WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving

May 04, 2019

Senthil Yogamani, Ciaran Hughes, Jonathan Horgan, Ganesh Sistu, Padraig Varley, Derek O'Dea, Michal Uricar, Stefan Milz, Martin Simon, Karl Amende(+7 more)

Figure 1 for WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving

Figure 2 for WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving

Figure 3 for WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving

Figure 4 for WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving

Abstract:Fisheye cameras are commonly employed for obtaining a large field of view in surveillance, augmented reality and in particular automotive applications. In spite of its prevalence, there are few public datasets for detailed evaluation of computer vision algorithms on fisheye images. We release the first extensive fisheye automotive dataset, WoodScape, named after Robert Wood who invented the fisheye camera in 1906. WoodScape comprises of four surround view cameras and nine tasks including segmentation, depth estimation, 3D bounding box detection and soiling detection. Semantic annotation of 40 classes at the instance level is provided for over 10,000 images and annotation for other tasks are provided for over 100,000 images. We would like to encourage the community to adapt computer vision models for fisheye camera instead of naive rectification.

* The dataset and code for baseline experiments will be provided in stages upon publication of this paper

Via

Access Paper or Ask Questions

Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds

Apr 16, 2019

Martin Simon, Karl Amende, Andrea Kraus, Jens Honer, Timo Sämann, Hauke Kaulbersch, Stefan Milz, Horst Michael Gross

Figure 1 for Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds

Figure 2 for Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds

Figure 3 for Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds

Figure 4 for Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds

Abstract:Accurate detection of 3D objects is a fundamental problem in computer vision and has an enormous impact on autonomous cars, augmented/virtual reality and many applications in robotics. In this work we present a novel fusion of neural network based state-of-the-art 3D detector and visual semantic segmentation in the context of autonomous driving. Additionally, we introduce Scale-Rotation-Translation score (SRTs), a fast and highly parameterizable evaluation metric for comparison of object detections, which speeds up our inference time up to 20\% and halves training time. On top, we apply state-of-the-art online multi target feature tracking on the object measurements to further increase accuracy and robustness utilizing temporal information. Our experiments on KITTI show that we achieve same results as state-of-the-art in all related categories, while maintaining the performance and accuracy trade-off and still run in real-time. Furthermore, our model is the first one that fuses visual semantic with 3D object detection.

Via

Access Paper or Ask Questions

Efficient Semantic Segmentation for Visual Bird's-eye View Interpretation

Nov 29, 2018

Timo Sämann, Karl Amende, Stefan Milz, Christian Witt, Martin Simon, Johannes Petzold

Figure 1 for Efficient Semantic Segmentation for Visual Bird's-eye View Interpretation

Figure 2 for Efficient Semantic Segmentation for Visual Bird's-eye View Interpretation

Figure 3 for Efficient Semantic Segmentation for Visual Bird's-eye View Interpretation

Figure 4 for Efficient Semantic Segmentation for Visual Bird's-eye View Interpretation

Abstract:The ability to perform semantic segmentation in real-time capable applications with limited hardware is of great importance. One such application is the interpretation of the visual bird's-eye view, which requires the semantic segmentation of the four omnidirectional camera images. In this paper, we present an efficient semantic segmentation that sets new standards in terms of runtime and hardware requirements. Our two main contributions are the decrease of the runtime by parallelizing the ArgMax layer and the reduction of hardware requirements by applying the channel pruning method to the ENet model.

* Advances in Intelligent Systems and Computing 2018

Via

Access Paper or Ask Questions

Monocular Fisheye Camera Depth Estimation Using Sparse LiDAR Supervision

Sep 24, 2018

Varun Ravi Kumar, Stefan Milz, Martin Simon, Christian Witt, Karl Amende, Johannes Petzold, Senthil Yogamani, Timo Pech

Figure 1 for Monocular Fisheye Camera Depth Estimation Using Sparse LiDAR Supervision

Figure 2 for Monocular Fisheye Camera Depth Estimation Using Sparse LiDAR Supervision

Figure 3 for Monocular Fisheye Camera Depth Estimation Using Sparse LiDAR Supervision

Figure 4 for Monocular Fisheye Camera Depth Estimation Using Sparse LiDAR Supervision

Abstract:Near field depth estimation around a self driving car is an important function that can be achieved by four wide angle fisheye cameras having a field of view of over 180. Depth estimation based on convolutional neural networks (CNNs) produce state of the art results, but progress is hindered because depth annotation cannot be obtained manually. Synthetic datasets are commonly used but they have limitations. For instance, they do not capture the extensive variability in the appearance of objects like vehicles present in real datasets. There is also a domain shift while performing inference on natural images illustrated by many attempts to handle the domain adaptation explicitly. In this work, we explore an alternate approach of training using sparse LiDAR data as ground truth for depth estimation for fisheye camera. We built our own dataset using our self driving car setup which has a 64 beam Velodyne LiDAR and four wide angle fisheye cameras. To handle the difference in view points of LiDAR and fisheye camera, an occlusion resolution mechanism was implemented. We started with Eigen's multiscale convolutional network architecture and improved by modifying activation function and optimizer. We obtained promising results on our dataset with RMSE errors comparable to the state of the art results obtained on KITTI.

Via

Access Paper or Ask Questions

Complex-YOLO: Real-time 3D Object Detection on Point Clouds

Sep 24, 2018

Martin Simon, Stefan Milz, Karl Amende, Horst-Michael Gross

Figure 1 for Complex-YOLO: Real-time 3D Object Detection on Point Clouds

Figure 2 for Complex-YOLO: Real-time 3D Object Detection on Point Clouds

Figure 3 for Complex-YOLO: Real-time 3D Object Detection on Point Clouds

Figure 4 for Complex-YOLO: Real-time 3D Object Detection on Point Clouds

Abstract:Lidar based 3D object detection is inevitable for autonomous driving, because it directly links to environmental understanding and therefore builds the base for prediction and motion planning. The capacity of inferencing highly sparse 3D data in real-time is an ill-posed problem for lots of other application areas besides automated vehicles, e.g. augmented reality, personal robotics or industrial automation. We introduce Complex-YOLO, a state of the art real-time 3D object detection network on point clouds only. In this work, we describe a network that expands YOLOv2, a fast 2D standard object detector for RGB images, by a specific complex regression strategy to estimate multi-class 3D boxes in Cartesian space. Thus, we propose a specific Euler-Region-Proposal Network (E-RPN) to estimate the pose of the object by adding an imaginary and a real fraction to the regression network. This ends up in a closed complex space and avoids singularities, which occur by single angle estimations. The E-RPN supports to generalize well during training. Our experiments on the KITTI benchmark suite show that we outperform current leading methods for 3D object detection specifically in terms of efficiency. We achieve state of the art results for cars, pedestrians and cyclists by being more than five times faster than the fastest competitor. Further, our model is capable of estimating all eight KITTI-classes, including Vans, Trucks or sitting pedestrians simultaneously with high accuracy.

Via

Access Paper or Ask Questions