Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Neeloy Chakraborty

Adaptive Stress Testing Black-Box LLM Planners

May 08, 2025

Neeloy Chakraborty, John Pohovey, Melkior Ornik, Katherine Driggs-Campbell

Abstract:Large language models (LLMs) have recently demonstrated success in generalizing across decision-making tasks including planning, control and prediction, but their tendency to hallucinate unsafe and undesired outputs poses risks. We argue that detecting such failures is necessary, especially in safety-critical scenarios. Existing black-box methods often detect hallucinations by identifying inconsistencies across multiple samples. Many of these approaches typically introduce prompt perturbations like randomizing detail order or generating adversarial inputs, with the intuition that a confident model should produce stable outputs. We first perform a manual case study showing that other forms of perturbations (e.g., adding noise, removing sensor details) cause LLMs to hallucinate in a driving environment. We then propose a novel method for efficiently searching the space of prompt perturbations using Adaptive Stress Testing (AST) with Monte-Carlo Tree Search (MCTS). Our AST formulation enables discovery of scenarios and prompts that cause language models to act with high uncertainty. By generating MCTS prompt perturbation trees across diverse scenarios, we show that offline analyses can be used at runtime to automatically generate prompts that influence model uncertainty, and to inform real-time trust assessments of an LLM.

* 26 pages, 16 figures, 4 tables

Via

Access Paper or Ask Questions

An Expert Ensemble for Detecting Anomalous Scenes, Interactions, and Behaviors in Autonomous Driving

Feb 23, 2025

Tianchen Ji, Neeloy Chakraborty, Andre Schreiber, Katherine Driggs-Campbell

Abstract:As automated vehicles enter public roads, safety in a near-infinite number of driving scenarios becomes one of the major concerns for the widespread adoption of fully autonomous driving. The ability to detect anomalous situations outside of the operational design domain is a key component in self-driving cars, enabling us to mitigate the impact of abnormal ego behaviors and to realize trustworthy driving systems. On-road anomaly detection in egocentric videos remains a challenging problem due to the difficulties introduced by complex and interactive scenarios. We conduct a holistic analysis of common on-road anomaly patterns, from which we propose three unsupervised anomaly detection experts: a scene expert that focuses on frame-level appearances to detect abnormal scenes and unexpected scene motions; an interaction expert that models normal relative motions between two road participants and raises alarms whenever anomalous interactions emerge; and a behavior expert which monitors abnormal behaviors of individual objects by future trajectory prediction. To combine the strengths of all the modules, we propose an expert ensemble (Xen) using a Kalman filter, in which the final anomaly score is absorbed as one of the states and the observations are generated by the experts. Our experiments employ a novel evaluation protocol for realistic model performance, demonstrate superior anomaly detection performance than previous methods, and show that our framework has potential in classifying anomaly types using unsupervised learning on a large-scale on-road anomaly dataset.

* Accepted by International Journal of Robotics Research (IJRR)

Via

Access Paper or Ask Questions

HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

Nov 19, 2024

Shuijing Liu, Haochen Xia, Fatemeh Cheraghi Pouria, Kaiwen Hong, Neeloy Chakraborty, Katherine Driggs-Campbell

Figure 1 for HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

Figure 2 for HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

Figure 3 for HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

Figure 4 for HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

Abstract:We study the problem of robot navigation in dense and interactive crowds with environmental constraints such as corridors and furniture. Previous methods fail to consider all types of interactions among agents and obstacles, leading to unsafe and inefficient robot paths. In this article, we leverage a graph-based representation of crowded and constrained scenarios and propose a structured framework to learn robot navigation policies with deep reinforcement learning. We first split the representations of different components in the environment and propose a heterogeneous spatio-temporal (st) graph to model distinct interactions among humans, robots, and obstacles. Based on the heterogeneous st-graph, we propose HEIGHT, a novel navigation policy network architecture with different components to capture heterogeneous interactions among entities through space and time. HEIGHT utilizes attention mechanisms to prioritize important interactions and a recurrent network to track changes in the dynamic scene over time, encouraging the robot to avoid collisions adaptively. Through extensive simulation and real-world experiments, we demonstrate that HEIGHT outperforms state-of-the-art baselines in terms of success and efficiency in challenging navigation scenarios. Furthermore, we demonstrate that our pipeline achieves better zero-shot generalization capability than previous works when the densities of humans and obstacles change. More videos are available at https://sites.google.com/view/crowdnav-height/home.

Via

Access Paper or Ask Questions

Towards Real-Time Generation of Delay-Compensated Video Feeds for Outdoor Mobile Robot Teleoperation

Sep 16, 2024

Neeloy Chakraborty, Yixiao Fang, Andre Schreiber, Tianchen Ji, Zhe Huang, Aganze Mihigo, Cassidy Wall, Abdulrahman Almana, Katherine Driggs-Campbell

Figure 1 for Towards Real-Time Generation of Delay-Compensated Video Feeds for Outdoor Mobile Robot Teleoperation

Figure 2 for Towards Real-Time Generation of Delay-Compensated Video Feeds for Outdoor Mobile Robot Teleoperation

Figure 3 for Towards Real-Time Generation of Delay-Compensated Video Feeds for Outdoor Mobile Robot Teleoperation

Figure 4 for Towards Real-Time Generation of Delay-Compensated Video Feeds for Outdoor Mobile Robot Teleoperation

Abstract:Teleoperation is an important technology to enable supervisors to control agricultural robots remotely. However, environmental factors in dense crop rows and limitations in network infrastructure hinder the reliability of data streamed to teleoperators. These issues result in delayed and variable frame rate video feeds that often deviate significantly from the robot's actual viewpoint. We propose a modular learning-based vision pipeline to generate delay-compensated images in real-time for supervisors. Our extensive offline evaluations demonstrate that our method generates more accurate images compared to state-of-the-art approaches in our setting. Additionally, we are one of the few works to evaluate a delay-compensation method in outdoor field environments with complex terrain on data from a real robot in real-time. Additional videos are provided at https://sites.google.com/illinois.edu/comp-teleop.

* 8 pages, 4 figures, 3 tables

Via

Access Paper or Ask Questions

Cooperative Advisory Residual Policies for Congestion Mitigation

Jun 30, 2024

Aamir Hasan, Neeloy Chakraborty, Haonan Chen, Jung-Hoon Cho, Cathy Wu, Katherine Driggs-Campbell

Figure 1 for Cooperative Advisory Residual Policies for Congestion Mitigation

Figure 2 for Cooperative Advisory Residual Policies for Congestion Mitigation

Figure 3 for Cooperative Advisory Residual Policies for Congestion Mitigation

Figure 4 for Cooperative Advisory Residual Policies for Congestion Mitigation

Abstract:Fleets of autonomous vehicles can mitigate traffic congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these approaches are limited in practice as they assume precise control over autonomous vehicle fleets, incur extensive installation costs for a centralized sensor ecosystem, and also fail to account for uncertainty in driver behavior. To this end, we develop a class of learned residual policies that can be used in cooperative advisory systems and only require the use of a single vehicle with a human driver. Our policies advise drivers to behave in ways that mitigate traffic congestion while accounting for diverse driver behaviors, particularly drivers' reactions to instructions, to provide an improved user experience. To realize such policies, we introduce an improved reward function that explicitly addresses congestion mitigation and driver attitudes to advice. We show that our residual policies can be personalized by conditioning them on an inferred driver trait that is learned in an unsupervised manner with a variational autoencoder. Our policies are trained in simulation with our novel instruction adherence driver model, and evaluated in simulation and through a user study (N=16) to capture the sentiments of human drivers. Our results show that our approaches successfully mitigate congestion while adapting to different driver behaviors, with up to 20% and 40% improvement as measured by a combination metric of speed and deviations in speed across time over baselines in our simulation tests and user study, respectively. Our user study further shows that our policies are human-compatible and personalize to drivers.

Via

Access Paper or Ask Questions

Structured Graph Network for Constrained Robot Crowd Navigation with Low Fidelity Simulation

May 28, 2024

Shuijing Liu, Kaiwen Hong, Neeloy Chakraborty, Katherine Driggs-Campbell

Figure 1 for Structured Graph Network for Constrained Robot Crowd Navigation with Low Fidelity Simulation

Figure 2 for Structured Graph Network for Constrained Robot Crowd Navigation with Low Fidelity Simulation

Figure 3 for Structured Graph Network for Constrained Robot Crowd Navigation with Low Fidelity Simulation

Figure 4 for Structured Graph Network for Constrained Robot Crowd Navigation with Low Fidelity Simulation

Abstract:We investigate the feasibility of deploying reinforcement learning (RL) policies for constrained crowd navigation using a low-fidelity simulator. We introduce a representation of the dynamic environment, separating human and obstacle representations. Humans are represented through detected states, while obstacles are represented as computed point clouds based on maps and robot localization. This representation enables RL policies trained in a low-fidelity simulator to deploy in real world with a reduced sim2real gap. Additionally, we propose a spatio-temporal graph to model the interactions between agents and obstacles. Based on the graph, we use attention mechanisms to capture the robot-human, human-human, and human-obstacle interactions. Our method significantly improves navigation performance in both simulated and real-world environments. Video demonstrations can be found at https://sites.google.com/view/constrained-crowdnav/home.

Via

Access Paper or Ask Questions

Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art

Mar 25, 2024

Neeloy Chakraborty, Melkior Ornik, Katherine Driggs-Campbell

Abstract:Autonomous systems are soon to be ubiquitous, from manufacturing autonomy to agricultural field robots, and from health care assistants to the entertainment industry. The majority of these systems are developed with modular sub-components for decision-making, planning, and control that may be hand-engineered or learning-based. While these existing approaches have been shown to perform well under the situations they were specifically designed for, they can perform especially poorly in rare, out-of-distribution scenarios that will undoubtedly arise at test-time. The rise of foundation models trained on multiple tasks with impressively large datasets from a variety of fields has led researchers to believe that these models may provide common sense reasoning that existing planners are missing. Researchers posit that this common sense reasoning will bridge the gap between algorithm development and deployment to out-of-distribution tasks, like how humans adapt to unexpected scenarios. Large language models have already penetrated the robotics and autonomous systems domains as researchers are scrambling to showcase their potential use cases in deployment. While this application direction is very promising empirically, foundation models are known to hallucinate and generate decisions that may sound reasonable, but are in fact poor. We argue there is a need to step back and simultaneously design systems that can quantify the certainty of a model's decision, and detect when it may be hallucinating. In this work, we discuss the current use cases of foundation models for decision-making tasks, provide a general definition for hallucinations with examples, discuss existing approaches to hallucination detection and mitigation with a focus on decision problems, and explore areas for further research in this exciting field.

* 31 pages, 2 tables

Via

Access Paper or Ask Questions

PeRP: Personalized Residual Policies For Congestion Mitigation Through Co-operative Advisory Systems

Aug 15, 2023

Aamir Hasan, Neeloy Chakraborty, Haonan Chen, Jung-Hoon Cho, Cathy Wu, Katherine Driggs-Campbell

Figure 1 for PeRP: Personalized Residual Policies For Congestion Mitigation Through Co-operative Advisory Systems

Figure 2 for PeRP: Personalized Residual Policies For Congestion Mitigation Through Co-operative Advisory Systems

Figure 3 for PeRP: Personalized Residual Policies For Congestion Mitigation Through Co-operative Advisory Systems

Figure 4 for PeRP: Personalized Residual Policies For Congestion Mitigation Through Co-operative Advisory Systems

Abstract:Intelligent driving systems can be used to mitigate congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these systems assume precise control over autonomous vehicle fleets, and are hence limited in practice as they fail to account for uncertainty in human behavior. Piecewise Constant (PC) Policies address these issues by structurally modeling the likeness of human driving to reduce traffic congestion in dense scenarios to provide action advice to be followed by human drivers. However, PC policies assume that all drivers behave similarly. To this end, we develop a co-operative advisory system based on PC policies with a novel driver trait conditioned Personalized Residual Policy, PeRP. PeRP advises drivers to behave in ways that mitigate traffic congestion. We first infer the driver's intrinsic traits on how they follow instructions in an unsupervised manner with a variational autoencoder. Then, a policy conditioned on the inferred trait adapts the action of the PC policy to provide the driver with a personalized recommendation. Our system is trained in simulation with novel driver modeling of instruction adherence. We show that our approach successfully mitigates congestion while adapting to different driver behaviors, with 4 to 22% improvement in average speed over baselines.

* Accepted to ITSC 2023. Additional material and code is available at the project webpage: https://sites.google.com/illinois.edu/perp

Via

Access Paper or Ask Questions

Towards Co-operative Congestion Mitigation

Feb 17, 2023

Aamir Hasan, Neeloy Chakraborty, Cathy Wu, Katherine Driggs-Campbell

Figure 1 for Towards Co-operative Congestion Mitigation

Figure 2 for Towards Co-operative Congestion Mitigation

Figure 3 for Towards Co-operative Congestion Mitigation

Figure 4 for Towards Co-operative Congestion Mitigation

Abstract:The effects of traffic congestion are widespread and are an impedance to everyday life. Piecewise constant driving policies have shown promise in helping mitigate traffic congestion in simulation environments. However, no works currently test these policies in situations involving real human users. Thus, we propose to evaluate these policies through the use of a shared control framework in a collaborative experiment with the human driver and the driving policy aiming to co-operatively mitigate congestion. We intend to use the CARLA simulator alongside the Flow framework to conduct user studies to evaluate the affect of piecewise constant driving policies. As such, we present our in-progress work in building our framework and discuss our proposed plan on evaluating this framework through a human-in-the-loop simulation user study.

* Presented at the ICRA 2022 Workshop on Shared Autonomy in Physical Human-Robot Interaction: Adaptability and Trust

Via

Access Paper or Ask Questions

Learning Rewards and Skills to Follow Commands with A Data Efficient Visual-Audio Representation

Jan 23, 2023

Peixin Chang, Shuijing Liu, Tianchen Ji, Neeloy Chakraborty, D. Livingston McPherson, Katherine Driggs-Campbell

Abstract:Based on the recent advancements in representation learning, we propose a novel framework for command-following robots with raw sensor inputs. Previous RL-based methods are either difficult to continuously improve after the deployment or require a large number of new labels during the fine-tuning. Motivated by (self-)supervised contrastive learning literature, we propose a novel representation, named VAR++, that generates an intrinsic reward function for command-following robot tasks by associating images with sound commands. After the robot is deployed in a new domain, the representation can be updated intuitively and data-efficiently by non-experts, and the robot is able to fulfill sound commands without any hand-crafted reward functions. We demonstrate our approach on various sound types and robotic tasks, including navigation and manipulation with raw sensor inputs. In the simulated experiments, we show that our system can continually self-improve in previously unseen scenarios given fewer new labeled data, yet achieves better performance, compared with previous methods.

Via

Access Paper or Ask Questions