Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hanno Gottschalk

TU Berlin

Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection

Jun 26, 2025

Tobias J. Riedlinger, Kira Maag, Hanno Gottschalk

Abstract:Deep neural networks have set the state-of-the-art in computer vision tasks such as bounding box detection and semantic segmentation. Object detectors and segmentation models assign confidence scores to predictions, reflecting the model's uncertainty in object detection or pixel-wise classification. However, these confidence estimates are often miscalibrated, as their architectures and loss functions are tailored to task performance rather than probabilistic foundation. Even with well calibrated predictions, object detectors fail to quantify uncertainty outside detected bounding boxes, i.e., the model does not make a probability assessment of whether an area without detected objects is truly free of obstacles. This poses a safety risk in applications such as automated driving, where uncertainty in empty areas remains unexplored. In this work, we propose an object detection model grounded in spatial statistics. Bounding box data matches realizations of a marked point process, commonly used to describe the probabilistic occurrence of spatial point events identified as bounding box centers, where marks are used to describe the spatial extension of bounding boxes and classes. Our statistical framework enables a likelihood-based training and provides well-defined confidence estimates for whether a region is drivable, i.e., free of objects. We demonstrate the effectiveness of our method through calibration assessments and evaluation of performance.

* 15 pages, 4 figures, 3 tables

Via

Access Paper or Ask Questions

Robust Evolutionary Multi-Objective Network Architecture Search for Reinforcement Learning (EMNAS-RL)

Jun 10, 2025

Nihal Acharya Adde, Alexandra Gianzina, Hanno Gottschalk, Andreas Ebert

Abstract:This paper introduces Evolutionary Multi-Objective Network Architecture Search (EMNAS) for the first time to optimize neural network architectures in large-scale Reinforcement Learning (RL) for Autonomous Driving (AD). EMNAS uses genetic algorithms to automate network design, tailored to enhance rewards and reduce model size without compromising performance. Additionally, parallelization techniques are employed to accelerate the search, and teacher-student methodologies are implemented to ensure scalable optimization. This research underscores the potential of transfer learning as a robust framework for optimizing performance across iterative learning processes by effectively leveraging knowledge from earlier generations to enhance learning efficiency and stability in subsequent generations. Experimental results demonstrate that tailored EMNAS outperforms manually designed models, achieving higher rewards with fewer parameters. The findings of these strategies contribute positively to EMNAS for RL in autonomous driving, advancing the field toward better-performing networks suitable for real-world scenarios.

* Published at ESANN 2025 Conference

Via

Access Paper or Ask Questions

Generative AI for Autonomous Driving: A Review

May 21, 2025

Katharina Winter, Abhishek Vivekanandan, Rupert Polley, Yinzhe Shen, Christian Schlauch, Mohamed-Khalil Bouzidi, Bojan Derajic, Natalie Grabowsky, Annajoyce Mariani, Dennis Rochau(+10 more)

Abstract:Generative AI (GenAI) is rapidly advancing the field of Autonomous Driving (AD), extending beyond traditional applications in text, image, and video generation. We explore how generative models can enhance automotive tasks, such as static map creation, dynamic scenario generation, trajectory forecasting, and vehicle motion planning. By examining multiple generative approaches ranging from Variational Autoencoder (VAEs) over Generative Adversarial Networks (GANs) and Invertible Neural Networks (INNs) to Generative Transformers (GTs) and Diffusion Models (DMs), we highlight and compare their capabilities and limitations for AD-specific applications. Additionally, we discuss hybrid methods integrating conventional techniques with generative approaches, and emphasize their improved adaptability and robustness. We also identify relevant datasets and outline open research questions to guide future developments in GenAI. Finally, we discuss three core challenges: safety, interpretability, and realtime capabilities, and present recommendations for image generation, dynamic scenario generation, and planning.

* This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

Learning Brenier Potentials with Convex Generative Adversarial Neural Networks

Apr 28, 2025

Claudia Drygala, Hanno Gottschalk, Thomas Kruse, Ségolène Martin, Annika Mütze

Abstract:Brenier proved that under certain conditions on a source and a target probability measure there exists a strictly convex function such that its gradient is a transport map from the source to the target distribution. This function is called the Brenier potential. Furthermore, detailed information on the H\"older regularity of the Brenier potential is available. In this work we develop the statistical learning theory of generative adversarial neural networks that learn the Brenier potential. As by the transformation of densities formula, the density of the generated measure depends on the second derivative of the Brenier potential, we develop the universal approximation theory of ReCU networks with cubic activation $\mathtt{ReCU}(x)=\max\{0,x\}^3$ that combines the favorable approximation properties of H\"older functions with a Lipschitz continuous density. In order to assure the convexity of such general networks, we introduce an adversarial training procedure for a potential function represented by the ReCU networks that combines the classical discriminator cross entropy loss with a penalty term that enforces (strict) convexity. We give a detailed decomposition of learning errors and show that for a suitable high penalty parameter all networks chosen in the adversarial min-max optimization problem are strictly convex. This is further exploited to prove the consistency of the learning procedure for (slowly) expanding network capacity. We also implement the described learning algorithm and apply it to a number of standard test cases from Gaussian mixture to image data as target distributions. As predicted in theory, we observe that the convexity loss becomes inactive during the training process and the potentials represented by the neural networks have learned convexity.

Via

Access Paper or Ask Questions

Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models -

Apr 16, 2025

Laura Fieback, Nishilkumar Balar, Jakob Spiegelberg, Hanno Gottschalk

Abstract:Despite recent advances in Large Vision Language Models (LVLMs), these models still suffer from generating hallucinatory responses that do not align with the visual input provided. To mitigate such hallucinations, we introduce Efficient Contrastive Decoding (ECD), a simple method that leverages probabilistic hallucination detection to shift the output distribution towards contextually accurate answers at inference time. By contrasting token probabilities and hallucination scores, ECD subtracts hallucinated concepts from the original distribution, effectively suppressing hallucinations. Notably, our proposed method can be applied to any open-source LVLM and does not require additional LVLM training. We evaluate our method on several benchmark datasets and across different LVLMs. Our experiments show that ECD effectively mitigates hallucinations, outperforming state-of-the-art methods with respect to performance on LVLM benchmarks and computation time.

Via

Access Paper or Ask Questions

Numerical and statistical analysis of NeuralODE with Runge-Kutta time integration

Mar 13, 2025

Emily C. Ehrhardt, Hanno Gottschalk, Tobias J. Riedlinger

Abstract:NeuralODE is one example for generative machine learning based on the push forward of a simple source measure with a bijective mapping, which in the case of NeuralODE is given by the flow of a ordinary differential equation. Using Liouville's formula, the log-density of the push forward measure is easy to compute and thus NeuralODE can be trained based on the maximum Likelihood method such that the Kulback-Leibler divergence between the push forward through the flow map and the target measure generating the data becomes small. In this work, we give a detailed account on the consistency of Maximum Likelihood based empirical risk minimization for a generic class of target measures. In contrast to prior work, we do not only consider the statistical learning theory, but also give a detailed numerical analysis of the NeuralODE algorithm based on the 2nd order Runge-Kutta (RK) time integration. Using the universal approximation theory for deep ReQU networks, the stability and convergence rated for the RK scheme as well as metric entropy and concentration inequalities, we are able to prove that NeuralODE is a probably approximately correct (PAC) learning algorithm.

* 29 pages

Via

Access Paper or Ask Questions

Segment-Level Road Obstacle Detection Using Visual Foundation Model Priors and Likelihood Ratios

Dec 07, 2024

Youssef Shoeb, Nazir Nayal, Azarm Nowzard, Fatma Güney, Hanno Gottschalk

Abstract:Detecting road obstacles is essential for autonomous vehicles to navigate dynamic and complex traffic environments safely. Current road obstacle detection methods typically assign a score to each pixel and apply a threshold to generate final predictions. However, selecting an appropriate threshold is challenging, and the per-pixel classification approach often leads to fragmented predictions with numerous false positives. In this work, we propose a novel method that leverages segment-level features from visual foundation models and likelihood ratios to predict road obstacles directly. By focusing on segments rather than individual pixels, our approach enhances detection accuracy, reduces false positives, and offers increased robustness to scene variability. We benchmark our approach against existing methods on the RoadObstacle and LostAndFound datasets, achieving state-of-the-art performance without needing a predefined threshold.

* 10 pages, 4 figures, and 1 table, to be published in VISAPP 2025

Via

Access Paper or Ask Questions

A Study on Unsupervised Domain Adaptation for Semantic Segmentation in the Era of Vision-Language Models

Nov 25, 2024

Manuel Schwonberg, Claus Werner, Hanno Gottschalk, Carsten Meyer

Abstract:Despite the recent progress in deep learning based computer vision, domain shifts are still one of the major challenges. Semantic segmentation for autonomous driving faces a wide range of domain shifts, e.g. caused by changing weather conditions, new geolocations and the frequent use of synthetic data in model training. Unsupervised domain adaptation (UDA) methods have emerged which adapt a model to a new target domain by only using unlabeled data of that domain. The variety of UDA methods is large but all of them use ImageNet pre-trained models. Recently, vision-language models have demonstrated strong generalization capabilities which may facilitate domain adaptation. We show that simply replacing the encoder of existing UDA methods like DACS by a vision-language pre-trained encoder can result in significant performance improvements of up to 10.0% mIoU on the GTA5-to-Cityscapes domain shift. For the generalization performance to unseen domains, the newly employed vision-language pre-trained encoder provides a gain of up to 13.7% mIoU across three unseen datasets. However, we find that not all UDA methods can be easily paired with the new encoder and that the UDA performance does not always likewise transfer into generalization performance. Finally, we perform our experiments on an adverse weather condition domain shift to further verify our findings on a pure real-to-real domain shift.

* Accepted to British Machine Vision Conference (BMVC) 2024: Workshop on Robust Recognition in the Open World (RROW)

Via

Access Paper or Ask Questions

Comparison of Generative Learning Methods for Turbulence Modeling

Nov 25, 2024

Claudia Drygala, Edmund Ross, Francesca di Mare, Hanno Gottschalk

Abstract:Numerical simulations of turbulent flows present significant challenges in fluid dynamics due to their complexity and high computational cost. High resolution techniques such as Direct Numerical Simulation (DNS) and Large Eddy Simulation (LES) are generally not computationally affordable, particularly for technologically relevant problems. Recent advances in machine learning, specifically in generative probabilistic models, offer promising alternatives for turbulence modeling. This paper investigates the application of three generative models - Variational Autoencoders (VAE), Deep Convolutional Generative Adversarial Networks (DCGAN), and Denoising Diffusion Probabilistic Models (DDPM) - in simulating a 2D K\'arm\'an vortex street around a fixed cylinder. Training data was obtained by means of LES. We evaluate each model's ability to capture the statistical properties and spatial structures of the turbulent flow. Our results demonstrate that DDPM and DCGAN effectively replicate the flow distribution, highlighting their potential as efficient and accurate tools for turbulence modeling. We find a strong argument for DCGAN, as although they are more difficult to train (due to problems such as mode collapse), they gave the fastest inference and training time, require less data to train compared to VAE and DDPM, and provide the results most closely aligned with the input stream. In contrast, VAE train quickly (and can generate samples quickly) but do not produce adequate results, and DDPM, whilst effective, is significantly slower at both inference and training time.

Via

Access Paper or Ask Questions

New advances in universal approximation with neural networks of minimal width

Nov 13, 2024

Dennis Rochau, Hanno Gottschalk, Robin Chan

Figure 1 for New advances in universal approximation with neural networks of minimal width

Figure 2 for New advances in universal approximation with neural networks of minimal width

Figure 3 for New advances in universal approximation with neural networks of minimal width

Abstract:Deep neural networks have achieved remarkable success in diverse applications, prompting the need for a solid theoretical foundation. Recent research has identified the minimal width $\max\{2,d_x,d_y\}$ required for neural networks with input dimensions $d_x$ and output dimension $d_y$ that use leaky ReLU activations to universally approximate $L^p(\mathbb{R}^{d_x},\mathbb{R}^{d_y})$ on compacta. Here, we present an alternative proof for the minimal width of such neural networks, by directly constructing approximating networks using a coding scheme that leverages the properties of leaky ReLUs and standard $L^p$ results. The obtained construction has a minimal interior dimension of $1$, independent of input and output dimensions, which allows us to show that autoencoders with leaky ReLU activations are universal approximators of $L^p$ functions. Furthermore, we demonstrate that the normalizing flow LU-Net serves as a distributional universal approximator. We broaden our results to show that smooth invertible neural networks can approximate $L^p(\mathbb{R}^{d},\mathbb{R}^{d})$ on compacta when the dimension $d\geq 2$, which provides a constructive proof of a classical theorem of Brenier and Gangbo. In addition, we use a topological argument to establish that for FNNs with monotone Lipschitz continuous activations, $d_x+1$ is a lower bound on the minimal width required for the uniform universal approximation of continuous functions $C^0(\mathbb{R}^{d_x},\mathbb{R}^{d_y})$ on compacta when $d_x\geq d_y$.

* 56 pages in the main text, 70 pages with Appendix

Via

Access Paper or Ask Questions