Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cody Fleming

Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning

Jun 26, 2025

Prajwal Koirala, Cody Fleming

Abstract:Generative models such as diffusion and flow-matching offer expressive policies for offline reinforcement learning (RL) by capturing rich, multimodal action distributions, but their iterative sampling introduces high inference costs and training instability due to gradient propagation across sampling steps. We propose the \textit{Single-Step Completion Policy} (SSCP), a generative policy trained with an augmented flow-matching objective to predict direct completion vectors from intermediate flow samples, enabling accurate, one-shot action generation. In an off-policy actor-critic framework, SSCP combines the expressiveness of generative models with the training and inference efficiency of unimodal policies, without requiring long backpropagation chains. Our method scales effectively to offline, offline-to-online, and online RL settings, offering substantial gains in speed and adaptability over diffusion-based baselines. We further extend SSCP to goal-conditioned RL, enabling flat policies to exploit subgoal structures without explicit hierarchical inference. SSCP achieves strong results across standard offline RL and behavior cloning benchmarks, positioning it as a versatile, expressive, and efficient framework for deep RL and sequential decision-making.

Via

Access Paper or Ask Questions

Can Pretrained Vision-Language Embeddings Alone Guide Robot Navigation?

Jun 17, 2025

Nitesh Subedi, Adam Haroon, Shreyan Ganguly, Samuel T. K. Tetteh, Prajwal Koirala, Cody Fleming, Soumik Sarkar

Abstract:Foundation models have revolutionized robotics by providing rich semantic representations without task-specific training. While many approaches integrate pretrained vision-language models (VLMs) with specialized navigation architectures, the fundamental question remains: can these pretrained embeddings alone successfully guide navigation without additional fine-tuning or specialized modules? We present a minimalist framework that decouples this question by training a behavior cloning policy directly on frozen vision-language embeddings from demonstrations collected by a privileged expert. Our approach achieves a 74% success rate in navigation to language-specified targets, compared to 100% for the state-aware expert, though requiring 3.2 times more steps on average. This performance gap reveals that pretrained embeddings effectively support basic language grounding but struggle with long-horizon planning and spatial reasoning. By providing this empirical baseline, we highlight both the capabilities and limitations of using foundation models as drop-in representations for embodied tasks, offering critical insights for robotics researchers facing practical design tradeoffs between system complexity and performance in resource-constrained scenarios. Our code is available at https://github.com/oadamharoon/text2nav

* 6 figures, 2 tables, Accepted to Robotics: Science and Systems (RSS) 2025 Workshop on Robot Planning in the Era of Foundation Models (FM4RoboPlan)

Via

Access Paper or Ask Questions

HopCast: Calibration of Autoregressive Dynamics Models

Jan 27, 2025

Muhammad Bilal Shahid, Cody Fleming

Figure 1 for HopCast: Calibration of Autoregressive Dynamics Models

Figure 2 for HopCast: Calibration of Autoregressive Dynamics Models

Figure 3 for HopCast: Calibration of Autoregressive Dynamics Models

Figure 4 for HopCast: Calibration of Autoregressive Dynamics Models

Abstract:Deep learning models are often trained to approximate dynamical systems that can be modeled using differential equations. These models are optimized to predict one step ahead and produce calibrated predictions if the predictive model can quantify uncertainty, such as deep ensembles. At inference time, multi-step predictions are generated via autoregression, which needs a sound uncertainty propagation method (e.g., Trajectory Sampling) to produce calibrated multi-step predictions. This paper introduces an approach named HopCast that uses the Modern Hopfield Network (MHN) to learn the residuals of a deterministic model that approximates the dynamical system. The MHN predicts the density of residuals based on a context vector at any timestep during autoregression. This approach produces calibrated multi-step predictions without uncertainty propagation and turns a deterministic model into a calibrated probabilistic model. This work is also the first to benchmark existing uncertainty propagation methods based on calibration errors with deep ensembles for multi-step predictions.

Via

Access Paper or Ask Questions

FAWAC: Feasibility Informed Advantage Weighted Regression for Persistent Safety in Offline Reinforcement Learning

Dec 12, 2024

Prajwal Koirala, Zhanhong Jiang, Soumik Sarkar, Cody Fleming

Abstract:Safe offline reinforcement learning aims to learn policies that maximize cumulative rewards while adhering to safety constraints, using only offline data for training. A key challenge is balancing safety and performance, particularly when the policy encounters out-of-distribution (OOD) states and actions, which can lead to safety violations or overly conservative behavior during deployment. To address these challenges, we introduce Feasibility Informed Advantage Weighted Actor-Critic (FAWAC), a method that prioritizes persistent safety in constrained Markov decision processes (CMDPs). FAWAC formulates policy optimization with feasibility conditions derived specifically for offline datasets, enabling safe policy updates in non-parametric policy space, followed by projection into parametric space for constrained actor training. By incorporating a cost-advantage term into Advantage Weighted Regression (AWR), FAWAC ensures that the safety constraints are respected while maximizing performance. Additionally, we propose a strategy to address a more challenging class of problems that involves tempting datasets where trajectories are predominantly high-rewarded but unsafe. Empirical evaluations on standard benchmarks demonstrate that FAWAC achieves strong results, effectively balancing safety and performance in learning policies from the static datasets.

Via

Access Paper or Ask Questions

Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning

Dec 11, 2024

Prajwal Koirala, Zhanhong Jiang, Soumik Sarkar, Cody Fleming

Abstract:In safe offline reinforcement learning (RL), the objective is to develop a policy that maximizes cumulative rewards while strictly adhering to safety constraints, utilizing only offline data. Traditional methods often face difficulties in balancing these constraints, leading to either diminished performance or increased safety risks. We address these issues with a novel approach that begins by learning a conservatively safe policy through the use of Conditional Variational Autoencoders, which model the latent safety constraints. Subsequently, we frame this as a Constrained Reward-Return Maximization problem, wherein the policy aims to optimize rewards while complying with the inferred latent safety constraints. This is achieved by training an encoder with a reward-Advantage Weighted Regression objective within the latent constraint space. Our methodology is supported by theoretical analysis, including bounds on policy performance and sample complexity. Extensive empirical evaluation on benchmark datasets, including challenging autonomous driving scenarios, demonstrates that our approach not only maintains safety compliance but also excels in cumulative reward optimization, surpassing existing methods. Additional visualizations provide further insights into the effectiveness and underlying mechanisms of our approach.

Via

Access Paper or Ask Questions

F1tenth Autonomous Racing With Offline Reinforcement Learning Methods

Aug 08, 2024

Prajwal Koirala, Cody Fleming

Figure 1 for F1tenth Autonomous Racing With Offline Reinforcement Learning Methods

Figure 2 for F1tenth Autonomous Racing With Offline Reinforcement Learning Methods

Figure 3 for F1tenth Autonomous Racing With Offline Reinforcement Learning Methods

Figure 4 for F1tenth Autonomous Racing With Offline Reinforcement Learning Methods

Abstract:Autonomous racing serves as a critical platform for evaluating automated driving systems and enhancing vehicle mobility intelligence. This work investigates offline reinforcement learning methods to train agents within the dynamic F1tenth racing environment. The study begins by exploring the challenges of online training in the Austria race track environment, where agents consistently fail to complete the laps. Consequently, this research pivots towards an offline strategy, leveraging `expert' demonstration dataset to facilitate agent training. A waypoint-based suboptimal controller is developed to gather data with successful lap episodes. This data is then employed to train offline learning-based algorithms, with a subsequent analysis of the agents' cross-track performance, evaluating their zero-shot transferability from seen to unseen scenarios and their capacity to adapt to changes in environment dynamics. Beyond mere algorithm benchmarking in autonomous racing scenarios, this study also introduces and describes the machinery of our return-conditioned decision tree-based policy, comparing its performance with methods that employ fully connected neural networks, Transformers, and Diffusion Policies and highlighting some insights into method selection for training autonomous agents in driving interactions.

Via

Access Paper or Ask Questions

Towards Robust Car Following Dynamics Modeling via Blackbox Models: Methodology, Analysis, and Recommendations

Feb 11, 2024

Muhammad Bilal Shahid, Cody Fleming

Abstract:The selection of the target variable is important while learning parameters of the classical car following models like GIPPS, IDM, etc. There is a vast body of literature on which target variable is optimal for classical car following models, but there is no study that empirically evaluates the selection of optimal target variables for black-box models, such as LSTM, etc. The black-box models, like LSTM and Gaussian Process (GP) are increasingly being used to model car following behavior without wise selection of target variables. The current work tests different target variables, like acceleration, velocity, and headway, for three black-box models, i.e., GP, LSTM, and Kernel Ridge Regression. These models have different objective functions and work in different vector spaces, e.g., GP works in function space, and LSTM works in parameter space. The experiments show that the optimal target variable recommendations for black-box models differ from classical car following models depending on the objective function and the vector space. It is worth mentioning that models and datasets used during evaluation are diverse in nature: the datasets contained both automated and human-driven vehicle trajectories; the black-box models belong to both parametric and non-parametric classes of models. This diversity is important during the analysis of variance, wherein we try to find the interaction between datasets, models, and target variables. It is shown that the models and target variables interact and recommended target variables don't depend on the dataset under consideration.

Via

Access Paper or Ask Questions

Reframing Offline Reinforcement Learning as a Regression Problem

Jan 21, 2024

Prajwal Koirala, Cody Fleming

Figure 1 for Reframing Offline Reinforcement Learning as a Regression Problem

Figure 2 for Reframing Offline Reinforcement Learning as a Regression Problem

Figure 3 for Reframing Offline Reinforcement Learning as a Regression Problem

Figure 4 for Reframing Offline Reinforcement Learning as a Regression Problem

Abstract:The study proposes the reformulation of offline reinforcement learning as a regression problem that can be solved with decision trees. Aiming to predict actions based on input states, return-to-go (RTG), and timestep information, we observe that with gradient-boosted trees, the agent training and inference are very fast, the former taking less than a minute. Despite the simplification inherent in this reformulated problem, our agent demonstrates performance that is at least on par with established methods. This assertion is validated by testing it across standard datasets associated with D4RL Gym-MuJoCo tasks. We further discuss the agent's ability to generalize by testing it on two extreme cases, how it learns to model the return distributions effectively even with highly skewed expert datasets, and how it exhibits robust performance in scenarios with sparse/delayed rewards.

Via

Access Paper or Ask Questions

SCAN: A Spatial Context Attentive Network for Joint Multi-Agent Intent Prediction

Jan 29, 2021

Jasmine Sekhon, Cody Fleming

Figure 1 for SCAN: A Spatial Context Attentive Network for Joint Multi-Agent Intent Prediction

Figure 2 for SCAN: A Spatial Context Attentive Network for Joint Multi-Agent Intent Prediction

Figure 3 for SCAN: A Spatial Context Attentive Network for Joint Multi-Agent Intent Prediction

Figure 4 for SCAN: A Spatial Context Attentive Network for Joint Multi-Agent Intent Prediction

Abstract:Safe navigation of autonomous agents in human centric environments requires the ability to understand and predict motion of neighboring pedestrians. However, predicting pedestrian intent is a complex problem. Pedestrian motion is governed by complex social navigation norms, is dependent on neighbors' trajectories, and is multimodal in nature. In this work, we propose \textbf{SCAN}, a \textbf{S}patial \textbf{C}ontext \textbf{A}ttentive \textbf{N}etwork that can jointly predict socially-acceptable multiple future trajectories for all pedestrians in a scene. SCAN encodes the influence of spatially close neighbors using a novel spatial attention mechanism in a manner that relies on fewer assumptions, is parameter efficient, and is more interpretable compared to state-of-the-art spatial attention approaches. Through experiments on several datasets we demonstrate that our approach can also quantitatively outperform state of the art trajectory prediction methods in terms of accuracy of predicted intent.

Via

Access Paper or Ask Questions

ShieldNN: A Provably Safe NN Filter for Unsafe NN Controllers

Jun 16, 2020

James Ferlez, Mahmoud Elnaggar, Yasser Shoukry, Cody Fleming

Figure 1 for ShieldNN: A Provably Safe NN Filter for Unsafe NN Controllers

Figure 2 for ShieldNN: A Provably Safe NN Filter for Unsafe NN Controllers

Figure 3 for ShieldNN: A Provably Safe NN Filter for Unsafe NN Controllers

Figure 4 for ShieldNN: A Provably Safe NN Filter for Unsafe NN Controllers

Abstract:In this paper, we consider the problem of creating a safe-by-design Rectified Linear Unit (ReLU) Neural Network (NN), which, when composed with an arbitrary control NN, makes the composition provably safe. In particular, we propose an algorithm to synthesize such NN filters that safely correct control inputs generated for the continuous-time Kinematic Bicycle Model (KBM). ShieldNN contains two main novel contributions: first, it is based on a novel Barrier Function (BF) for the KBM model; and second, it is itself a provably sound algorithm that leverages this BF to a design a safety filter NN with safety guarantees. Moreover, since the KBM is known to well approximate the dynamics of four-wheeled vehicles, we show the efficacy of ShieldNN filters in CARLA simulations of four-wheeled vehicles. In particular, we examined the effect of ShieldNN filters on Deep Reinforcement Learning trained controllers in the presence of individual pedestrian obstacles. The safety properties of ShieldNN were borne out in our experiments: the ShieldNN filter reduced the number of obstacle collisions by 99.4%-100%. Furthermore, we also studied the effect of incorporating ShieldNN during training: for a constant number of episodes, 28% less reward was observed when ShieldNN wasn't used during training. This suggests that ShieldNN has the further property of improving sample efficiency during RL training.

Via

Access Paper or Ask Questions