Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefanos Nikolaidis

AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization

Jun 05, 2025

Saeed Hedayatian, Stefanos Nikolaidis

Abstract:Quality-Diversity (QD) algorithms have shown remarkable success in discovering diverse, high-performing solutions, but rely heavily on hand-crafted behavioral descriptors that constrain exploration to predefined notions of diversity. Leveraging the equivalence between policies and occupancy measures, we present a theoretically grounded approach to automatically generate behavioral descriptors by embedding the occupancy measures of policies in Markov Decision Processes. Our method, AutoQD, leverages random Fourier features to approximate the Maximum Mean Discrepancy (MMD) between policy occupancy measures, creating embeddings whose distances reflect meaningful behavioral differences. A low-dimensional projection of these embeddings that captures the most behaviorally significant dimensions is then used as behavioral descriptors for off-the-shelf QD methods. We prove that our embeddings converge to true MMD distances between occupancy measures as the number of sampled trajectories and embedding dimensions increase. Through experiments in multiple continuous control tasks we demonstrate AutoQD's ability in discovering diverse policies without predefined behavioral descriptors, presenting a well-motivated alternative to prior methods in unsupervised Reinforcement Learning and QD optimization. Our approach opens new possibilities for open-ended learning and automated behavior discovery in sequential decision making settings without requiring domain-specific knowledge.

* 22 pages, 5 figures

Via

Access Paper or Ask Questions

Multi-Objective Covariance Matrix Adaptation MAP-Annealing

May 27, 2025

Shihan Zhao, Stefanos Nikolaidis

Abstract:Quality-Diversity (QD) optimization is an emerging field that focuses on finding a set of behaviorally diverse and high-quality solutions. While the quality is typically defined w.r.t. a single objective function, recent work on Multi-Objective Quality-Diversity (MOQD) extends QD optimization to simultaneously optimize multiple objective functions. This opens up multi-objective applications for QD, such as generating a diverse set of game maps that maximize difficulty, realism, or other properties. Existing MOQD algorithms use non-adaptive methods such as mutation and crossover to search for non-dominated solutions and construct an archive of Pareto Sets (PS). However, recent work in QD has demonstrated enhanced performance through the use of covariance-based evolution strategies for adaptive solution search. We propose bringing this insight into the MOQD problem, and introduce MO-CMA-MAE, a new MOQD algorithm that leverages Covariance Matrix Adaptation-Evolution Strategies (CMA-ES) to optimize the hypervolume associated with every PS within the archive. We test MO-CMA-MAE on three MOQD domains, and for generating maps of a co-operative video game, showing significant improvements in performance.

* Accepted at GECCO 2025; 16 pages, 13 figures

Via

Access Paper or Ask Questions

Integrating Field of View in Human-Aware Collaborative Planning

May 20, 2025

Ya-Chuan Hsu, Michael Defranco, Rutvik Patel, Stefanos Nikolaidis

Abstract:In human-robot collaboration (HRC), it is crucial for robot agents to consider humans' knowledge of their surroundings. In reality, humans possess a narrow field of view (FOV), limiting their perception. However, research on HRC often overlooks this aspect and presumes an omniscient human collaborator. Our study addresses the challenge of adapting to the evolving subtask intent of humans while accounting for their limited FOV. We integrate FOV within the human-aware probabilistic planning framework. To account for large state spaces due to considering FOV, we propose a hierarchical online planner that efficiently finds approximate solutions while enabling the robot to explore low-level action trajectories that enter the human FOV, influencing their intended subtask. Through user study with our adapted cooking domain, we demonstrate our FOV-aware planner reduces human's interruptions and redundant actions during collaboration by adapting to human perception limitations. We extend these findings to a virtual reality kitchen environment, where we observe similar collaborative behaviors.

* Published at International Conference on Robotics and Automation (2025)

Via

Access Paper or Ask Questions

ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation

May 14, 2025

Enyu Zhao, Vedant Raval, Hejia Zhang, Jiageng Mao, Zeyu Shangguan, Stefanos Nikolaidis, Yue Wang, Daniel Seita

Abstract:Vision-Language Models (VLMs) have revolutionized artificial intelligence and robotics due to their commonsense reasoning capabilities. In robotic manipulation, VLMs are used primarily as high-level planners, but recent work has also studied their lower-level reasoning ability, which refers to making decisions about precise robot movements. However, the community currently lacks a clear and common benchmark that can evaluate how well VLMs can aid low-level reasoning in robotics. Consequently, we propose a novel benchmark, ManipBench, to evaluate the low-level robot manipulation reasoning capabilities of VLMs across various dimensions, including how well they understand object-object interactions and deformable object manipulation. We extensively test 33 representative VLMs across 10 model families on our benchmark, including variants to test different model sizes. Our evaluation shows that the performance of VLMs significantly varies across tasks, and there is a strong correlation between this performance and trends in our real-world manipulation tasks. It also shows that there remains a significant gap between these models and human-level understanding. See our website at: https://manipbench.github.io.

* 47 pages, 29 figures. Under review

Via

Access Paper or Ask Questions

Modeling Personalized Difficulty of Rehabilitation Exercises Using Causal Trees

May 07, 2025

Nathaniel Dennler, Zhonghao Shi, Uksang Yoo, Stefanos Nikolaidis, Maja Matarić

Abstract:Rehabilitation robots are often used in game-like interactions for rehabilitation to increase a person's motivation to complete rehabilitation exercises. By adjusting exercise difficulty for a specific user throughout the exercise interaction, robots can maximize both the user's rehabilitation outcomes and the their motivation throughout the exercise. Previous approaches have assumed exercises have generic difficulty values that apply to all users equally, however, we identified that stroke survivors have varied and unique perceptions of exercise difficulty. For example, some stroke survivors found reaching vertically more difficult than reaching farther but lower while others found reaching farther more challenging than reaching vertically. In this paper, we formulate a causal tree-based method to calculate exercise difficulty based on the user's performance. We find that this approach accurately models exercise difficulty and provides a readily interpretable model of why that exercise is difficult for both users and caretakers.

* Accepted to IEEE/RAS-EMBS International Conference on Rehabilitation Robotics (ICORR 2025)

Via

Access Paper or Ask Questions

Algorithmic Prompt Generation for Diverse Human-like Teaming and Communication with Large Language Models

Apr 04, 2025

Siddharth Srikanth, Varun Bhatt, Boshen Zhang, Werner Hager, Charles Michael Lewis, Katia P. Sycara, Aaquib Tabrez, Stefanos Nikolaidis

Abstract:Understanding how humans collaborate and communicate in teams is essential for improving human-agent teaming and AI-assisted decision-making. However, relying solely on data from large-scale user studies is impractical due to logistical, ethical, and practical constraints, necessitating synthetic models of multiple diverse human behaviors. Recently, agents powered by Large Language Models (LLMs) have been shown to emulate human-like behavior in social settings. But, obtaining a large set of diverse behaviors requires manual effort in the form of designing prompts. On the other hand, Quality Diversity (QD) optimization has been shown to be capable of generating diverse Reinforcement Learning (RL) agent behavior. In this work, we combine QD optimization with LLM-powered agents to iteratively search for prompts that generate diverse team behavior in a long-horizon, multi-step collaborative environment. We first show, through a human-subjects experiment (n=54 participants), that humans exhibit diverse coordination and communication behavior in this domain. We then show that our approach can effectively replicate trends from human teaming data and also capture behaviors that are not easily observed without collecting large amounts of data. Our findings highlight the combination of QD and LLM-powered agents as an effective tool for studying teaming and communication strategies in multi-agent collaboration.

Via

Access Paper or Ask Questions

Soft and Compliant Contact-Rich Hair Manipulation and Care

Jan 05, 2025

Uksang Yoo, Nathaniel Dennler, Eliot Xing, Maja Matarić, Stefanos Nikolaidis, Jeffrey Ichnowski, Jean Oh

Figure 1 for Soft and Compliant Contact-Rich Hair Manipulation and Care

Figure 2 for Soft and Compliant Contact-Rich Hair Manipulation and Care

Figure 3 for Soft and Compliant Contact-Rich Hair Manipulation and Care

Figure 4 for Soft and Compliant Contact-Rich Hair Manipulation and Care

Abstract:Hair care robots can help address labor shortages in elderly care while enabling those with limited mobility to maintain their hair-related identity. We present MOE-Hair, a soft robot system that performs three hair-care tasks: head patting, finger combing, and hair grasping. The system features a tendon-driven soft robot end-effector (MOE) with a wrist-mounted RGBD camera, leveraging both mechanical compliance for safety and visual force sensing through deformation. In testing with a force-sensorized mannequin head, MOE achieved comparable hair-grasping effectiveness while applying significantly less force than rigid grippers. Our novel force estimation method combines visual deformation data and tendon tensions from actuators to infer applied forces, reducing sensing errors by up to 60.1% and 20.3% compared to actuator current load-only and depth image-only baselines, respectively. A user study with 12 participants demonstrated statistically significant preferences for MOE-Hair over a baseline system in terms of comfort, effectiveness, and appropriate force application. These results demonstrate the unique advantages of soft robots in contact-rich hair-care tasks, while highlighting the importance of precise force control despite the inherent compliance of the system.

Via

Access Paper or Ask Questions

Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation

Jan 02, 2025

Nathaniel Dennler, Stefanos Nikolaidis, Maja Matarić

Figure 1 for Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation

Figure 2 for Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation

Figure 3 for Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation

Figure 4 for Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation

Abstract:People have a variety of preferences for how robots behave. To understand and reason about these preferences, robots aim to learn a reward function that describes how aligned robot behaviors are with a user's preferences. Good representations of a robot's behavior can significantly reduce the time and effort required for a user to teach the robot their preferences. Specifying these representations -- what "features" of the robot's behavior matter to users -- remains a difficult problem; Features learned from raw data lack semantic meaning and features learned from user data require users to engage in tedious labeling processes. Our key insight is that users tasked with customizing a robot are intrinsically motivated to produce labels through exploratory search; they explore behaviors that they find interesting and ignore behaviors that are irrelevant. To harness this novel data source of exploratory actions, we propose contrastive learning from exploratory actions (CLEA) to learn trajectory features that are aligned with features that users care about. We learned CLEA features from exploratory actions users performed in an open-ended signal design activity (N=25) with a Kuri robot, and evaluated CLEA features through a second user study with a different set of users (N=42). CLEA features outperformed self-supervised features when eliciting user preferences over four metrics: completeness, simplicity, minimality, and explainability.

* Accepted to HRI 2025

Via

Access Paper or Ask Questions

Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots

Nov 17, 2024

Nathaniel Dennler, Zhonghao Shi, Stefanos Nikolaidis, Maja Matarić

Figure 1 for Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots

Figure 2 for Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots

Figure 3 for Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots

Figure 4 for Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots

Abstract:Assistive robots interact with humans and must adapt to different users' preferences to be effective. An easy and effective technique to learn non-expert users' preferences is through rankings of robot behaviors, for example, robot movement trajectories or gestures. Existing techniques focus on generating trajectories for users to rank that maximize the outcome of the preference learning process. However, the generated trajectories do not appear to reflect the user's preference over repeated interactions. In this work, we design an algorithm to generate trajectories for users to rank that we call Covariance Matrix Adaptation Evolution Strategies with Information Gain (CMA-ES-IG). CMA-ES-IG prioritizes the user's experience of the preference learning process. We show that users find our algorithm more intuitive and easier to use than previous approaches across both physical and social robot tasks. This project's code is hosted at github.com/interaction-lab/CMA-ES-IG

* Accepted to ISRR

Via

Access Paper or Ask Questions

Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity

Nov 07, 2024

Robby Costales, Stefanos Nikolaidis

Abstract:The wider application of end-to-end learning methods to embodied decision-making domains remains bottlenecked by their reliance on a superabundance of training data representative of the target domain. Meta-reinforcement learning (meta-RL) approaches abandon the aim of zero-shot generalization--the goal of standard reinforcement learning (RL)--in favor of few-shot adaptation, and thus hold promise for bridging larger generalization gaps. While learning this meta-level adaptive behavior still requires substantial data, efficient environment simulators approaching real-world complexity are growing in prevalence. Even so, hand-designing sufficiently diverse and numerous simulated training tasks for these complex domains is prohibitively labor-intensive. Domain randomization (DR) and procedural generation (PG), offered as solutions to this problem, require simulators to possess carefully-defined parameters which directly translate to meaningful task diversity--a similarly prohibitive assumption. In this work, we present DIVA, an evolutionary approach for generating diverse training tasks in such complex, open-ended simulators. Like unsupervised environment design (UED) methods, DIVA can be applied to arbitrary parameterizations, but can additionally incorporate realistically-available domain knowledge--thus inheriting the flexibility and generality of UED, and the supervised structure embedded in well-designed simulators exploited by DR and PG. Our empirical results showcase DIVA's unique ability to overcome complex parameterizations and successfully train adaptive agent behavior, far outperforming competitive baselines from prior literature. These findings highlight the potential of such semi-supervised environment design (SSED) approaches, of which DIVA is the first humble constituent, to enable training in realistic simulated domains, and produce more robust and capable adaptive agents.

* NeurIPS 2024

Via

Access Paper or Ask Questions