Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alan Fern

Oregon State University

Evaluating Robots Like Human Infants: A Case Study of Learned Bipedal Locomotion

Jul 08, 2025

Devin Crowley, Whitney G. Cole, Christina M. Hospodar, Ruiting Shen, Karen E. Adolph, Alan Fern

Abstract:Typically, learned robot controllers are trained via relatively unsystematic regimens and evaluated with coarse-grained outcome measures such as average cumulative reward. The typical approach is useful to compare learning algorithms but provides limited insight into the effects of different training regimens and little understanding about the richness and complexity of learned behaviors. Likewise, human infants and other animals are "trained" via unsystematic regimens, but in contrast, developmental psychologists evaluate their performance in highly-controlled experiments with fine-grained measures such as success, speed of walking, and prospective adjustments. However, the study of learned behavior in human infants is limited by the practical constraints of training and testing babies. Here, we present a case study that applies methods from developmental psychology to study the learned behavior of the simulated bipedal robot Cassie. Following research on infant walking, we systematically designed reinforcement learning training regimens and tested the resulting controllers in simulated environments analogous to those used for babies--but without the practical constraints. Results reveal new insights into the behavioral impact of different training regimens and the development of Cassie's learned behaviors relative to infants who are learning to walk. This interdisciplinary baby-robot approach provides inspiration for future research designed to systematically test effects of training on the development of complex learned robot behaviors.

* 7 pages, 4 figures, accepted into ICDL 2025 as a contributed paper

Via

Access Paper or Ask Questions

Transfer Learning via Auxiliary Labels with Application to Cold-Hardiness Prediction

Apr 17, 2025

Kristen Goebel, Paola Pesantez-Cabrera, Markus Keller, Alan Fern

Abstract:Cold temperatures can cause significant frost damage to fruit crops depending on their resilience, or cold hardiness, which changes throughout the dormancy season. This has led to the development of predictive cold-hardiness models, which help farmers decide when to deploy expensive frost-mitigation measures. Unfortunately, cold-hardiness data for model training is only available for some fruit cultivars due to the need for specialized equipment and expertise. Rather, farmers often do have years of phenological data (e.g. date of budbreak) that they regularly collect for their crops. In this work, we introduce a new transfer-learning framework, Transfer via Auxiliary Labels (TAL), that allows farmers to leverage the phenological data to produce more accurate cold-hardiness predictions, even when no cold-hardiness data is available for their specific crop. The framework assumes a set of source tasks (cultivars) where each has associated primary labels (cold hardiness) and auxiliary labels (phenology). However, the target task (new cultivar) is assumed to only have the auxiliary labels. The goal of TAL is to predict primary labels for the target task via transfer from the source tasks. Surprisingly, despite the vast literature on transfer learning, to our knowledge, the TAL formulation has not been previously addressed. Thus, we propose several new TAL approaches based on model selection and averaging that can leverage recent deep multi-task models for cold-hardiness prediction. Our results on real-world cold-hardiness and phenological data for multiple grape cultivars demonstrate that TAL can leverage the phenological data to improve cold-hardiness predictions in the absence of cold-hardiness data.

Via

Access Paper or Ask Questions

Self-attention-based Diffusion Model for Time-series Imputation in Partial Blackout Scenarios

Mar 03, 2025

Mohammad Rafid Ul Islam, Prasad Tadepalli, Alan Fern

Abstract:Missing values in multivariate time series data can harm machine learning performance and introduce bias. These gaps arise from sensor malfunctions, blackouts, and human error and are typically addressed by data imputation. Previous work has tackled the imputation of missing data in random, complete blackouts and forecasting scenarios. The current paper addresses a more general missing pattern, which we call "partial blackout," where a subset of features is missing for consecutive time steps. We introduce a two-stage imputation process using self-attention and diffusion processes to model feature and temporal correlations. Notably, our model effectively handles missing data during training, enhancing adaptability and ensuring reliable imputation and performance, even with incomplete datasets. Our experiments on benchmark and two real-world time series datasets demonstrate that our model outperforms the state-of-the-art in partial blackout scenarios and shows better scalability.

* 7 pages, 2 figures, 3 tables, Accepted in AAAI 2025 Main Track

Via

Access Paper or Ask Questions

WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies

Feb 26, 2025

William Solow, Sandhya Saisubramanian, Alan Fern

Abstract:We introduce WOFOSTGym, a novel crop simulation environment designed to train reinforcement learning (RL) agents to optimize agromanagement decisions for annual and perennial crops in single and multi-farm settings. Effective crop management requires optimizing yield and economic returns while minimizing environmental impact, a complex sequential decision-making problem well suited for RL. However, the lack of simulators for perennial crops in multi-farm contexts has hindered RL applications in this domain. Existing crop simulators also do not support multiple annual crops. WOFOSTGym addresses these gaps by supporting 23 annual crops and two perennial crops, enabling RL agents to learn diverse agromanagement strategies in multi-year, multi-crop, and multi-farm settings. Our simulator offers a suite of challenging tasks for learning under partial observability, non-Markovian dynamics, and delayed feedback. WOFOSTGym's standard RL interface allows researchers without agricultural expertise to explore a wide range of agromanagement problems. Our experiments demonstrate the learned behaviors across various crop varieties and soil types, highlighting WOFOSTGym's potential for advancing RL-driven decision support in agriculture.

Via

Access Paper or Ask Questions

Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs

Feb 08, 2025

Aayam Shrestha, Pan Liu, German Ros, Kai Yuan, Alan Fern

Abstract:This work focuses on generating realistic, physically-based human behaviors from multi-modal inputs, which may only partially specify the desired motion. For example, the input may come from a VR controller providing arm motion and body velocity, partial key-point animation, computer vision applied to videos, or even higher-level motion goals. This requires a versatile low-level humanoid controller that can handle such sparse, under-specified guidance, seamlessly switch between skills, and recover from failures. Current approaches for learning humanoid controllers from demonstration data capture some of these characteristics, but none achieve them all. To this end, we introduce the Masked Humanoid Controller (MHC), a novel approach that applies multi-objective imitation learning on augmented and selectively masked motion demonstrations. The training methodology results in an MHC that exhibits the key capabilities of catch-up to out-of-sync input commands, combining elements from multiple motion sequences, and completing unspecified parts of motions from sparse multimodal input. We demonstrate these key capabilities for an MHC learned over a dataset of 87 diverse skills and showcase different multi-modal use cases, including integration with planning frameworks to highlight MHC's ability to solve new user-defined tasks without any finetuning.

* The European Conference on Computer Vision (ECCV), 2024

Via

Access Paper or Ask Questions

Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning

Dec 25, 2024

Yassine Chemingui, Aryan Deshwal, Honghao Wei, Alan Fern, Janardhan Rao Doppa

Abstract:Offline safe reinforcement learning (OSRL) involves learning a decision-making policy to maximize rewards from a fixed batch of training data to satisfy pre-defined safety constraints. However, adapting to varying safety constraints during deployment without retraining remains an under-explored challenge. To address this challenge, we introduce constraint-adaptive policy switching (CAPS), a wrapper framework around existing offline RL algorithms. During training, CAPS uses offline data to learn multiple policies with a shared representation that optimize different reward and cost trade-offs. During testing, CAPS switches between those policies by selecting at each state the policy that maximizes future rewards among those that satisfy the current cost constraint. Our experiments on 38 tasks from the DSRL benchmark demonstrate that CAPS consistently outperforms existing methods, establishing a strong wrapper-based baseline for OSRL. The code is publicly available at https://github.com/yassineCh/CAPS.

Via

Access Paper or Ask Questions

GABAR: Graph Attention-Based Action Ranking for Relational Policy Learning

Dec 06, 2024

Rajesh Mangannavar, Stefan Lee, Alan Fern, Prasad Tadepalli

Abstract:We propose a novel approach to learn relational policies for classical planning based on learning to rank actions. We introduce a new graph representation that explicitly captures action information and propose a Graph Neural Network architecture augmented with Gated Recurrent Units (GRUs) to learn action rankings. Our model is trained on small problem instances and generalizes to significantly larger instances where traditional planning becomes computationally expensive. Experimental results across standard planning benchmarks demonstrate that our action-ranking approach achieves generalization to significantly larger problems than those used in training.

* 6 Pages, 1 figure

Via

Access Paper or Ask Questions

Hierarchical Object-Oriented POMDP Planning for Object Rearrangement

Dec 02, 2024

Rajesh Mangannavar, Alan Fern, Prasad Tadepalli

Figure 1 for Hierarchical Object-Oriented POMDP Planning for Object Rearrangement

Figure 2 for Hierarchical Object-Oriented POMDP Planning for Object Rearrangement

Figure 3 for Hierarchical Object-Oriented POMDP Planning for Object Rearrangement

Figure 4 for Hierarchical Object-Oriented POMDP Planning for Object Rearrangement

Abstract:We present an online planning framework for solving multi-object rearrangement problems in partially observable, multi-room environments. Current object rearrangement solutions, primarily based on Reinforcement Learning or hand-coded planning methods, often lack adaptability to diverse challenges. To address this limitation, we introduce a novel Hierarchical Object-Oriented Partially Observed Markov Decision Process (HOO-POMDP) planning approach. This approach comprises of (a) an object-oriented POMDP planner generating sub-goals, (b) a set of low-level policies for sub-goal achievement, and (c) an abstraction system converting the continuous low-level world into a representation suitable for abstract planning. We evaluate our system on varying numbers of objects, rooms, and problem types in AI2-THOR simulated environments with promising results.

* 17 pages, 2 Figures. Preprint. Under review at ICLR 2025

Via

Access Paper or Ask Questions

Learning Decentralized Multi-Biped Control for Payload Transport

Jun 25, 2024

Bikram Pandit, Ashutosh Gupta, Mohitvishnu S. Gadde, Addison Johnson, Aayam Kumar Shrestha, Helei Duan, Jeremy Dao, Alan Fern

Figure 1 for Learning Decentralized Multi-Biped Control for Payload Transport

Figure 2 for Learning Decentralized Multi-Biped Control for Payload Transport

Figure 3 for Learning Decentralized Multi-Biped Control for Payload Transport

Figure 4 for Learning Decentralized Multi-Biped Control for Payload Transport

Abstract:Payload transport over flat terrain via multi-wheel robot carriers is well-understood, highly effective, and configurable. In this paper, our goal is to provide similar effectiveness and configurability for transport over rough terrain that is more suitable for legs rather than wheels. For this purpose, we consider multi-biped robot carriers, where wheels are replaced by multiple bipedal robots attached to the carrier. Our main contribution is to design a decentralized controller for such systems that can be effectively applied to varying numbers and configurations of rigidly attached bipedal robots without retraining. We present a reinforcement learning approach for training the controller in simulation that supports transfer to the real world. Our experiments in simulation provide quantitative metrics showing the effectiveness of the approach over a wide variety of simulated transport scenarios. In addition, we demonstrate the controller in the real-world for systems composed of two and three Cassie robots. To our knowledge, this is the first example of a scalable multi-biped payload transport system.

* Submitted to CoRL 2024, Project website: decmbc.github.io

Via

Access Paper or Ask Questions

Revisiting Reward Design and Evaluation for Robust Humanoid Standing and Walking

Apr 30, 2024

Bart van Marum, Aayam Shrestha, Helei Duan, Pranay Dugar, Jeremy Dao, Alan Fern

Abstract:A necessary capability for humanoid robots is the ability to stand and walk while rejecting natural disturbances. Recent progress has been made using sim-to-real reinforcement learning (RL) to train such locomotion controllers, with approaches differing mainly in their reward functions. However, prior works lack a clear method to systematically test new reward functions and compare controller performance through repeatable experiments. This limits our understanding of the trade-offs between approaches and hinders progress. To address this, we propose a low-cost, quantitative benchmarking method to evaluate and compare the real-world performance of standing and walking (SaW) controllers on metrics like command following, disturbance recovery, and energy efficiency. We also revisit reward function design and construct a minimally constraining reward function to train SaW controllers. We experimentally verify that our benchmarking framework can identify areas for improvement, which can be systematically addressed to enhance the policies. We also compare our new controller to state-of-the-art controllers on the Digit humanoid robot. The results provide clear quantitative trade-offs among the controllers and suggest directions for future improvements to the reward functions and expansion of the benchmarks.

* 8 pages, 5 figs

Via

Access Paper or Ask Questions