Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marin Vlastelica

Provable Maximum Entropy Manifold Exploration via Diffusion Models

Jun 18, 2025

Riccardo De Santi, Marin Vlastelica, Ya-Ping Hsieh, Zebang Shen, Niao He, Andreas Krause

Abstract:Exploration is critical for solving real-world decision-making problems such as scientific discovery, where the objective is to generate truly novel designs rather than mimic existing data distributions. In this work, we address the challenge of leveraging the representational power of generative models for exploration without relying on explicit uncertainty quantification. We introduce a novel framework that casts exploration as entropy maximization over the approximate data manifold implicitly defined by a pre-trained diffusion model. Then, we present a novel principle for exploration based on density estimation, a problem well-known to be challenging in practice. To overcome this issue and render this method truly scalable, we leverage a fundamental connection between the entropy of the density induced by a diffusion model and its score function. Building on this, we develop an algorithm based on mirror descent that solves the exploration problem as sequential fine-tuning of a pre-trained diffusion model. We prove its convergence to the optimal exploratory diffusion model under realistic assumptions by leveraging recent understanding of mirror flows. Finally, we empirically evaluate our approach on both synthetic and high-dimensional text-to-image diffusion, demonstrating promising results.

* ICML 2025

Via

Access Paper or Ask Questions

Dual-Force: Enhanced Offline Diversity Maximization under Imitation Constraints

Jan 08, 2025

Pavel Kolev, Marin Vlastelica, Georg Martius

Abstract:While many algorithms for diversity maximization under imitation constraints are online in nature, many applications require offline algorithms without environment interactions. Tackling this problem in the offline setting, however, presents significant challenges that require non-trivial, multi-stage optimization processes with non-stationary rewards. In this work, we present a novel offline algorithm that enhances diversity using an objective based on Van der Waals (VdW) force and successor features, and eliminates the need to learn a previously used skill discriminator. Moreover, by conditioning the value function and policy on a pre-trained Functional Reward Encoding (FRE), our method allows for better handling of non-stationary rewards and provides zero-shot recall of all skills encountered during training, significantly expanding the set of skills learned in prior work. Consequently, our algorithm benefits from receiving a consistently strong diversity signal (VdW), and enjoys more stable and efficient training. We demonstrate the effectiveness of our method in generating diverse skills for two robotic tasks in simulation: locomotion of a quadruped and local navigation with obstacle traversal.

Via

Access Paper or Ask Questions

Causal Action Influence Aware Counterfactual Data Augmentation

May 29, 2024

Núria Armengol Urpí, Marco Bagatella, Marin Vlastelica, Georg Martius

Figure 1 for Causal Action Influence Aware Counterfactual Data Augmentation

Figure 2 for Causal Action Influence Aware Counterfactual Data Augmentation

Figure 3 for Causal Action Influence Aware Counterfactual Data Augmentation

Figure 4 for Causal Action Influence Aware Counterfactual Data Augmentation

Abstract:Offline data are both valuable and practical resources for teaching robots complex behaviors. Ideally, learning agents should not be constrained by the scarcity of available demonstrations, but rather generalize beyond the training distribution. However, the complexity of real-world scenarios typically requires huge amounts of data to prevent neural network policies from picking up on spurious correlations and learning non-causal relationships. We propose CAIAC, a data augmentation method that can create feasible synthetic transitions from a fixed dataset without having access to online environment interactions. By utilizing principled methods for quantifying causal influence, we are able to perform counterfactual reasoning by swapping $\it{action}$-unaffected parts of the state-space between independent trajectories in the dataset. We empirically show that this leads to a substantial increase in robustness of offline learning algorithms against distributional shift.

* Accepted in 41st International Conference on Machine Learning (ICML 2024)

Via

Access Paper or Ask Questions

Learning Diverse Skills for Local Navigation under Multi-constraint Optimality

Oct 03, 2023

Jin Cheng, Marin Vlastelica, Pavel Kolev, Chenhao Li, Georg Martius

Figure 1 for Learning Diverse Skills for Local Navigation under Multi-constraint Optimality

Figure 2 for Learning Diverse Skills for Local Navigation under Multi-constraint Optimality

Figure 3 for Learning Diverse Skills for Local Navigation under Multi-constraint Optimality

Figure 4 for Learning Diverse Skills for Local Navigation under Multi-constraint Optimality

Abstract:Despite many successful applications of data-driven control in robotics, extracting meaningful diverse behaviors remains a challenge. Typically, task performance needs to be compromised in order to achieve diversity. In many scenarios, task requirements are specified as a multitude of reward terms, each requiring a different trade-off. In this work, we take a constrained optimization viewpoint on the quality-diversity trade-off and show that we can obtain diverse policies while imposing constraints on their value functions which are defined through distinct rewards. In line with previous work, further control of the diversity level can be achieved through an attract-repel reward term motivated by the Van der Waals force. We demonstrate the effectiveness of our method on a local navigation task where a quadruped robot needs to reach the target within a finite horizon. Finally, our trained policies transfer well to the real 12-DoF quadruped robot, Solo12, and exhibit diverse agile behaviors with successful obstacle traversal.

* 7 pages, 6 figures, in submission to ICRA 2024

Via

Access Paper or Ask Questions

Diffusion Generative Inverse Design

Sep 18, 2023

Marin Vlastelica, Tatiana López-Guevara, Kelsey Allen, Peter Battaglia, Arnaud Doucet, Kimberley Stachenfeld

Figure 1 for Diffusion Generative Inverse Design

Figure 2 for Diffusion Generative Inverse Design

Figure 3 for Diffusion Generative Inverse Design

Figure 4 for Diffusion Generative Inverse Design

Abstract:Inverse design refers to the problem of optimizing the input of an objective function in order to enact a target outcome. For many real-world engineering problems, the objective function takes the form of a simulator that predicts how the system state will evolve over time, and the design challenge is to optimize the initial conditions that lead to a target outcome. Recent developments in learned simulation have shown that graph neural networks (GNNs) can be used for accurate, efficient, differentiable estimation of simulator dynamics, and support high-quality design optimization with gradient- or sampling-based optimization procedures. However, optimizing designs from scratch requires many expensive model queries, and these procedures exhibit basic failures on either non-convex or high-dimensional problems. In this work, we show how denoising diffusion models (DDMs) can be used to solve inverse design problems efficiently and propose a particle sampling algorithm for further improving their efficiency. We perform experiments on a number of fluid dynamics design challenges, and find that our approach substantially reduces the number of calls to the simulator compared to standard techniques.

* ICML workshop on Structured Probabilistic Inference & Generative Modeling

Via

Access Paper or Ask Questions

Mind the Uncertainty: Risk-Aware and Actively Exploring Model-Based Reinforcement Learning

Sep 11, 2023

Marin Vlastelica, Sebastian Blaes, Cristina Pineri, Georg Martius

Abstract:We introduce a simple but effective method for managing risk in model-based reinforcement learning with trajectory sampling that involves probabilistic safety constraints and balancing of optimism in the face of epistemic uncertainty and pessimism in the face of aleatoric uncertainty of an ensemble of stochastic neural networks.Various experiments indicate that the separation of uncertainties is essential to performing well with data-driven MPC approaches in uncertain and safety-critical control environments.

Via

Access Paper or Ask Questions

Diverse Offline Imitation via Fenchel Duality

Jul 21, 2023

Marin Vlastelica, Pavel Kolev, Jin Cheng, Georg Martius

Figure 1 for Diverse Offline Imitation via Fenchel Duality

Figure 2 for Diverse Offline Imitation via Fenchel Duality

Figure 3 for Diverse Offline Imitation via Fenchel Duality

Figure 4 for Diverse Offline Imitation via Fenchel Duality

Abstract:There has been significant recent progress in the area of unsupervised skill discovery, with various works proposing mutual information based objectives, as a source of intrinsic motivation. Prior works predominantly focused on designing algorithms that require online access to the environment. In contrast, we develop an \textit{offline} skill discovery algorithm. Our problem formulation considers the maximization of a mutual information objective constrained by a KL-divergence. More precisely, the constraints ensure that the state occupancy of each skill remains close to the state occupancy of an expert, within the support of an offline dataset with good state-action coverage. Our main contribution is to connect Fenchel duality, reinforcement learning and unsupervised skill discovery, and to give a simple offline algorithm for learning diverse skills that are aligned with an expert.

Via

Access Paper or Ask Questions

Spuriosity Didn't Kill the Classifier: Using Invariant Predictions to Harness Spurious Features

Jul 19, 2023

Cian Eastwood, Shashank Singh, Andrei Liviu Nicolicioiu, Marin Vlastelica, Julius von Kügelgen, Bernhard Schölkopf

Abstract:To avoid failures on out-of-distribution data, recent works have sought to extract features that have a stable or invariant relationship with the label across domains, discarding the "spurious" or unstable features whose relationship with the label changes across domains. However, unstable features often carry complementary information about the label that could boost performance if used correctly in the test domain. Our main contribution is to show that it is possible to learn how to use these unstable features in the test domain without labels. In particular, we prove that pseudo-labels based on stable features provide sufficient guidance for doing so, provided that stable and unstable features are conditionally independent given the label. Based on this theoretical insight, we propose Stable Feature Boosting (SFB), an algorithm for: (i) learning a predictor that separates stable and conditionally-independent unstable features; and (ii) using the stable-feature predictions to adapt the unstable-feature predictions in the test domain. Theoretically, we prove that SFB can learn an asymptotically-optimal predictor without test-domain labels. Empirically, we demonstrate the effectiveness of SFB on real and synthetic data.

Via

Access Paper or Ask Questions

Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions

Sep 16, 2022

Chenhao Li, Sebastian Blaes, Pavel Kolev, Marin Vlastelica, Jonas Frey, Georg Martius

Figure 1 for Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions

Figure 2 for Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions

Figure 3 for Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions

Figure 4 for Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions

Abstract:Learning diverse skills is one of the main challenges in robotics. To this end, imitation learning approaches have achieved impressive results. These methods require explicitly labeled datasets or assume consistent skill execution to enable learning and active control of individual behaviors, which limits their applicability. In this work, we propose a cooperative adversarial method for obtaining single versatile policies with controllable skill sets from unlabeled datasets containing diverse state transition patterns by maximizing their discriminability. Moreover, we show that by utilizing unsupervised skill discovery in the generative adversarial imitation learning framework, novel and useful skills emerge with successful task fulfillment. Finally, the obtained versatile policies are tested on an agile quadruped robot called Solo 8 and present faithful replications of diverse skills encoded in the demonstrations.

Via

Access Paper or Ask Questions

Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

Jun 23, 2022

Chenhao Li, Marin Vlastelica, Sebastian Blaes, Jonas Frey, Felix Grimminger, Georg Martius

Figure 1 for Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

Figure 2 for Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

Figure 3 for Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

Figure 4 for Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

Abstract:Learning agile skills is one of the main challenges in robotics. To this end, reinforcement learning approaches have achieved impressive results. These methods require explicit task information in terms of a reward function or an expert that can be queried in simulation to provide a target control output, which limits their applicability. In this work, we propose a generative adversarial method for inferring reward functions from partial and potentially physically incompatible demonstrations for successful skill acquirement where reference or expert demonstrations are not easily accessible. Moreover, we show that by using a Wasserstein GAN formulation and transitions from demonstrations with rough and partial information as input, we are able to extract policies that are robust and capable of imitating demonstrated behaviors. Finally, the obtained skills such as a backflip are tested on an agile quadruped robot called Solo 8 and present faithful replication of hand-held human demonstrations.

Via

Access Paper or Ask Questions