Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Georg Waltner

FuseSeg: LiDAR Point Cloud Segmentation Fusing Multi-Modal Data

Dec 19, 2019

Georg Krispel, Michael Opitz, Georg Waltner, Horst Possegger, Horst Bischof

Figure 1 for FuseSeg: LiDAR Point Cloud Segmentation Fusing Multi-Modal Data

Figure 2 for FuseSeg: LiDAR Point Cloud Segmentation Fusing Multi-Modal Data

Figure 3 for FuseSeg: LiDAR Point Cloud Segmentation Fusing Multi-Modal Data

Figure 4 for FuseSeg: LiDAR Point Cloud Segmentation Fusing Multi-Modal Data

Abstract:We introduce a simple yet effective fusion method of LiDAR and RGB data to segment LiDAR point clouds. Utilizing the dense native range representation of a LiDAR sensor and the setup calibration, we establish point correspondences between the two input modalities. Subsequently, we are able to warp and fuse the features from one domain into the other. Therefore, we can jointly exploit information from both data sources within one single network. To show the merit of our method, we extend SqueezeSeg, a point cloud segmentation network, with an RGB feature branch and fuse it into the original structure. Our extension called FuseSeg leads to an improvement of up to 18% IoU on the KITTI benchmark. In addition to the improved accuracy, we also achieve real-time performance at 50 fps, five times as fast as the KITTI LiDAR data recording speed.

* Accepted for publication in WACV 2020

Via

Access Paper or Ask Questions

Deep 2.5D Vehicle Classification with Sparse SfM Depth Prior for Automated Toll Systems

May 11, 2018

Georg Waltner, Michael Maurer, Thomas Holzmann, Patrick Ruprecht, Michael Opitz, Horst Possegger, Friedrich Fraundorfer, Horst Bischof

Figure 1 for Deep 2.5D Vehicle Classification with Sparse SfM Depth Prior for Automated Toll Systems

Figure 2 for Deep 2.5D Vehicle Classification with Sparse SfM Depth Prior for Automated Toll Systems

Figure 3 for Deep 2.5D Vehicle Classification with Sparse SfM Depth Prior for Automated Toll Systems

Figure 4 for Deep 2.5D Vehicle Classification with Sparse SfM Depth Prior for Automated Toll Systems

Abstract:Automated toll systems rely on proper classification of the passing vehicles. This is especially difficult when the images used for classification only cover parts of the vehicle. To obtain information about the whole vehicle. we reconstruct the vehicle as 3D object and exploit this additional information within a Convolutional Neural Network (CNN). However, when using deep networks for 3D object classification, large amounts of dense 3D models are required for good accuracy, which are often neither available nor feasible to process due to memory requirements. Therefore, in our method we reproject the 3D object onto the image plane using the reconstructed points, lines or both. We utilize this sparse depth prior within an auxiliary network branch that acts as a regularizer during training. We show that this auxiliary regularizer helps to improve accuracy compared to 2D classification on a real-world dataset. Furthermore due to the design of the network, at test time only the 2D camera images are required for classification which enables the usage in portable computer vision systems.

* Submitted to the IEEE International Conference on Intelligent Transportation Systems 2018 (ITSC), 6 pages, 4 figures; changed format in compliance with adapted IEEE template

Via

Access Paper or Ask Questions

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Jan 15, 2018

Michael Opitz, Georg Waltner, Horst Possegger, Horst Bischof

Figure 1 for Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Figure 2 for Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Figure 3 for Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Figure 4 for Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Abstract:Learning similarity functions between image pairs with deep neural networks yields highly correlated activations of embeddings. In this work, we show how to improve the robustness of such embeddings by exploiting the independence within ensembles. To this end, we divide the last embedding layer of a deep network into an embedding ensemble and formulate training this ensemble as an online gradient boosting problem. Each learner receives a reweighted training sample from the previous learners. Further, we propose two loss functions which increase the diversity in our ensemble. These loss functions can be applied either for weight initialization or during training. Together, our contributions leverage large embedding sizes more effectively by significantly reducing correlation of the embedding and consequently increase retrieval accuracy of the embedding. Our method works with any differentiable loss function and does not introduce any additional parameters during test time. We evaluate our metric learning method on image retrieval tasks and show that it improves over state-of-the-art methods on the CUB 200-2011, Cars-196, Stanford Online Products, In-Shop Clothes Retrieval and VehicleID datasets.

* Extension to our paper BIER: Boosting Independent Embeddings Robustly (ICCV 2017 oral) - submitted to PAMI

Via

Access Paper or Ask Questions

Grid Loss: Detecting Occluded Faces

Sep 01, 2016

Michael Opitz, Georg Waltner, Georg Poier, Horst Possegger, Horst Bischof

Figure 1 for Grid Loss: Detecting Occluded Faces

Figure 2 for Grid Loss: Detecting Occluded Faces

Figure 3 for Grid Loss: Detecting Occluded Faces

Figure 4 for Grid Loss: Detecting Occluded Faces

Abstract:Detection of partially occluded objects is a challenging computer vision problem. Standard Convolutional Neural Network (CNN) detectors fail if parts of the detection window are occluded, since not every sub-part of the window is discriminative on its own. To address this issue, we propose a novel loss layer for CNNs, named grid loss, which minimizes the error rate on sub-blocks of a convolution layer independently rather than over the whole feature map. This results in parts being more discriminative on their own, enabling the detector to recover if the detection window is partially occluded. By mapping our loss layer back to a regular fully connected layer, no additional computational cost is incurred at runtime compared to standard CNNs. We demonstrate our method for face detection on several public face detection benchmarks and show that our method outperforms regular CNNs, is suitable for realtime applications and achieves state-of-the-art performance.

* accepted to ECCV 2016

Via

Access Paper or Ask Questions

Indoor Activity Detection and Recognition for Sport Games Analysis

Apr 25, 2014

Georg Waltner, Thomas Mauthner, Horst Bischof

Figure 1 for Indoor Activity Detection and Recognition for Sport Games Analysis

Figure 2 for Indoor Activity Detection and Recognition for Sport Games Analysis

Figure 3 for Indoor Activity Detection and Recognition for Sport Games Analysis

Figure 4 for Indoor Activity Detection and Recognition for Sport Games Analysis

Abstract:Activity recognition in sport is an attractive field for computer vision research. Game, player and team analysis are of great interest and research topics within this field emerge with the goal of automated analysis. The very specific underlying rules of sports can be used as prior knowledge for the recognition task and present a constrained environment for evaluation. This paper describes recognition of single player activities in sport with special emphasis on volleyball. Starting from a per-frame player-centered activity recognition, we incorporate geometry and contextual information via an activity context descriptor that collects information about all player's activities over a certain timespan relative to the investigated player. The benefit of this context information on single player activity recognition is evaluated on our new real-life dataset presenting a total amount of almost 36k annotated frames containing 7 activity classes within 6 videos of professional volleyball games. Our incorporation of the contextual information improves the average player-centered classification performance of 77.56% by up to 18.35% on specific classes, proving that spatio-temporal context is an important clue for activity recognition.

* Part of the OAGM 2014 proceedings (arXiv:1404.3538)

Via

Access Paper or Ask Questions