Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marco F. Huber

ViPro-2: Unsupervised State Estimation via Integrated Dynamics for Guiding Video Prediction

Aug 08, 2025

Patrick Takenaka, Johannes Maucher, Marco F. Huber

Abstract:Predicting future video frames is a challenging task with many downstream applications. Previous work has shown that procedural knowledge enables deep models for complex dynamical settings, however their model ViPro assumed a given ground truth initial symbolic state. We show that this approach led to the model learning a shortcut that does not actually connect the observed environment with the predicted symbolic state, resulting in the inability to estimate states given an observation if previous states are noisy. In this work, we add several improvements to ViPro that enables the model to correctly infer states from observations without providing a full ground truth state in the beginning. We show that this is possible in an unsupervised manner, and extend the original Orbits dataset with a 3D variant to close the gap to real world scenarios.

* Published in 2025 International Joint Conference on Neural Networks (IJCNN)

Via

Access Paper or Ask Questions

Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains

Jul 23, 2025

Jingyi Yu, Tim Pychynski, Marco F. Huber

Figure 1 for Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains

Figure 2 for Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains

Figure 3 for Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains

Figure 4 for Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains

Abstract:To gain deeper insights into a complex sensor system through the lens of causality, we present common and individual causal mechanism estimation (CICME), a novel three-step approach to inferring causal mechanisms from heterogeneous data collected across multiple domains. By leveraging the principle of Causal Transfer Learning (CTL), CICME is able to reliably detect domain-invariant causal mechanisms when provided with sufficient samples. The identified common causal mechanisms are further used to guide the estimation of the remaining causal mechanisms in each domain individually. The performance of CICME is evaluated on linear Gaussian models under scenarios inspired from a manufacturing process. Building upon existing continuous optimization-based causal discovery methods, we show that CICME leverages the benefits of applying causal discovery on the pooled data and repeatedly on data from individual domains, and it even outperforms both baseline methods under certain scenarios.

Via

Access Paper or Ask Questions

Generative Adversarial Networks with Limited Data: A Survey and Benchmarking

Apr 07, 2025

Omar De Mitri, Ruyu Wang, Marco F. Huber

Figure 1 for Generative Adversarial Networks with Limited Data: A Survey and Benchmarking

Figure 2 for Generative Adversarial Networks with Limited Data: A Survey and Benchmarking

Figure 3 for Generative Adversarial Networks with Limited Data: A Survey and Benchmarking

Figure 4 for Generative Adversarial Networks with Limited Data: A Survey and Benchmarking

Abstract:Generative Adversarial Networks (GANs) have shown impressive results in various image synthesis tasks. Vast studies have demonstrated that GANs are more powerful in feature and expression learning compared to other generative models and their latent space encodes rich semantic information. However, the tremendous performance of GANs heavily relies on the access to large-scale training data and deteriorates rapidly when the amount of data is limited. This paper aims to provide an overview of GANs, its variants and applications in various vision tasks, focusing on addressing the limited data issue. We analyze state-of-the-art GANs in limited data regime with designed experiments, along with presenting various methods attempt to tackle this problem from different perspectives. Finally, we further elaborate on remaining challenges and trends for future research.

Via

Access Paper or Ask Questions

Sample-Efficient Bayesian Transfer Learning for Online Machine Parameter Optimization

Mar 21, 2025

Philipp Wagner, Tobias Nagel, Philipp Leube, Marco F. Huber

Figure 1 for Sample-Efficient Bayesian Transfer Learning for Online Machine Parameter Optimization

Figure 2 for Sample-Efficient Bayesian Transfer Learning for Online Machine Parameter Optimization

Figure 3 for Sample-Efficient Bayesian Transfer Learning for Online Machine Parameter Optimization

Figure 4 for Sample-Efficient Bayesian Transfer Learning for Online Machine Parameter Optimization

Abstract:Correctly setting the parameters of a production machine is essential to improve product quality, increase efficiency, and reduce production costs while also supporting sustainability goals. Identifying optimal parameters involves an iterative process of producing an object and evaluating its quality. Minimizing the number of iterations is, therefore, desirable to reduce the costs associated with unsuccessful attempts. This work introduces a method to optimize the machine parameters in the system itself using a Bayesian optimization algorithm. By leveraging existing machine data, we use a transfer learning approach in order to identify an optimum with minimal iterations, resulting in a cost-effective transfer learning algorithm. We validate our approach on a laser machine for cutting sheet metal in the real world.

* Accepted in IEEE Conference on Artificial Intelligence, 2025

Via

Access Paper or Ask Questions

STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation

Mar 15, 2025

Ruyu Wang, Xuefeng Hou, Sabrina Schmedding, Marco F. Huber

Figure 1 for STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation

Figure 2 for STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation

Figure 3 for STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation

Figure 4 for STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation

Abstract:In layout-to-image (L2I) synthesis, controlled complex scenes are generated from coarse information like bounding boxes. Such a task is exciting to many downstream applications because the input layouts offer strong guidance to the generation process while remaining easily reconfigurable by humans. In this paper, we proposed STyled LAYout Diffusion (STAY Diffusion), a diffusion-based model that produces photo-realistic images and provides fine-grained control of stylized objects in scenes. Our approach learns a global condition for each layout, and a self-supervised semantic map for weight modulation using a novel Edge-Aware Normalization (EA Norm). A new Styled-Mask Attention (SM Attention) is also introduced to cross-condition the global condition and image feature for capturing the objects' relationships. These measures provide consistent guidance through the model, enabling more accurate and controllable image generation. Extensive benchmarking demonstrates that our STAY Diffusion presents high-quality images while surpassing previous state-of-the-art methods in generation diversity, accuracy, and controllability.

* Accepted by WACV2025

Via

Access Paper or Ask Questions

Data Efficient Prediction of excited-state properties using Quantum Neural Networks

Dec 12, 2024

Manuel Hagelüken, Marco F. Huber, Marco Roth

Abstract:Understanding the properties of excited states of complex molecules is crucial for many chemical and physical processes. Calculating these properties is often significantly more resource-intensive than calculating their ground state counterparts. We present a quantum machine learning model that predicts excited-state properties from the molecular ground state for different geometric configurations. The model comprises a symmetry-invariant quantum neural network and a conventional neural network and is able to provide accurate predictions with only a few training data points. The proposed procedure is fully NISQ compatible. This is achieved by using a quantum circuit that requires a number of parameters linearly proportional to the number of molecular orbitals, along with a parameterized measurement observable, thereby reducing the number of necessary measurements. We benchmark the algorithm on three different molecules by evaluating its performance in predicting excited state transition energies and transition dipole moments. We show that, in many instances, the procedure is able to outperform various classical models that rely solely on classical features.

* 10 + 4 pages, 7 + 3 figures

Via

Access Paper or Ask Questions

The Contribution of XAI for the Safe Development and Certification of AI: An Expert-Based Analysis

Jul 22, 2024

Benjamin Fresz, Vincent Philipp Göbels, Safa Omri, Danilo Brajovic, Andreas Aichele, Janika Kutz, Jens Neuhüttler, Marco F. Huber

Abstract:Developing and certifying safe - or so-called trustworthy - AI has become an increasingly salient issue, especially in light of upcoming regulation such as the EU AI Act. In this context, the black-box nature of machine learning models limits the use of conventional avenues of approach towards certifying complex technical systems. As a potential solution, methods to give insights into this black-box - devised in the field of eXplainable AI (XAI) - could be used. In this study, the potential and shortcomings of such methods for the purpose of safe AI development and certification are discussed in 15 qualitative interviews with experts out of the areas of (X)AI and certification. We find that XAI methods can be a helpful asset for safe AI development, as they can show biases and failures of ML-models, but since certification relies on comprehensive and correct information about technical systems, their impact is expected to be limited.

Via

Access Paper or Ask Questions

Guiding Video Prediction with Explicit Procedural Knowledge

Jun 26, 2024

Patrick Takenaka, Johannes Maucher, Marco F. Huber

Figure 1 for Guiding Video Prediction with Explicit Procedural Knowledge

Figure 2 for Guiding Video Prediction with Explicit Procedural Knowledge

Figure 3 for Guiding Video Prediction with Explicit Procedural Knowledge

Figure 4 for Guiding Video Prediction with Explicit Procedural Knowledge

Abstract:We propose a general way to integrate procedural knowledge of a domain into deep learning models. We apply it to the case of video prediction, building on top of object-centric deep models and show that this leads to a better performance than using data-driven models alone. We develop an architecture that facilitates latent space disentanglement in order to use the integrated procedural knowledge, and establish a setup that allows the model to learn the procedural interface in the latent space using the downstream task of video prediction. We contrast the performance to a state-of-the-art data-driven approach and show that problems where purely data-driven approaches struggle can be handled by using knowledge about the domain, providing an alternative to simply collecting more data.

* 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, France, 2023, pp. 1076-1084
* Published in 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

Via

Access Paper or Ask Questions

HIPer: A Human-Inspired Scene Perception Model for Multifunctional Mobile Robots

Apr 27, 2024

Florenz Graf, Jochen Lindermayr, Birgit Graf, Werner Kraus, Marco F. Huber

Abstract:Taking over arbitrary tasks like humans do with a mobile service robot in open-world settings requires a holistic scene perception for decision-making and high-level control. This paper presents a human-inspired scene perception model to minimize the gap between human and robotic capabilities. The approach takes over fundamental neuroscience concepts, such as a triplet perception split into recognition, knowledge representation, and knowledge interpretation. A recognition system splits the background and foreground to integrate exchangeable image-based object detectors and SLAM, a multi-layer knowledge base represents scene information in a hierarchical structure and offers interfaces for high-level control, and knowledge interpretation methods deploy spatio-temporal scene analysis and perceptual learning for self-adjustment. A single-setting ablation study is used to evaluate the impact of each component on the overall performance for a fetch-and-carry scenario in two simulated and one real-world environment.

Via

Access Paper or Ask Questions

Overview of Publicly Available Degradation Data Sets for Tasks within Prognostics and Health Management

Mar 20, 2024

Fabian Mauthe, Christopher Braun, Julian Raible, Peter Zeiler, Marco F. Huber

Abstract:Central to the efficacy of prognostics and health management methods is the acquisition and analysis of degradation data, which encapsulates the evolving health condition of engineering systems over time. Degradation data serves as a rich source of information, offering invaluable insights into the underlying degradation processes, failure modes, and performance trends of engineering systems. This paper provides an overview of publicly available degradation data sets.

Via

Access Paper or Ask Questions