Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Malcolm A. MacIver

Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents

May 18, 2025

Shuo Han, German Espinosa, Junda Huang, Daniel A. Dombeck, Malcolm A. MacIver, Bradly C. Stadie

Abstract:Recent advances in reinforcement learning (RL) have demonstrated impressive capabilities in complex decision-making tasks. This progress raises a natural question: how do these artificial systems compare to biological agents, which have been shaped by millions of years of evolution? To help answer this question, we undertake a comparative study of biological mice and RL agents in a predator-avoidance maze environment. Through this analysis, we identify a striking disparity: RL agents consistently demonstrate a lack of self-preservation instinct, readily risking ``death'' for marginal efficiency gains. These risk-taking strategies are in contrast to biological agents, which exhibit sophisticated risk-assessment and avoidance behaviors. Towards bridging this gap between the biological and artificial, we propose two novel mechanisms that encourage more naturalistic risk-avoidance behaviors in RL agents. Our approach leads to the emergence of naturalistic behaviors, including strategic environment assessment, cautious path planning, and predator avoidance patterns that closely mirror those observed in biological systems.

* 19 pages

Via

Access Paper or Ask Questions

Achieving mouse-level strategic evasion performance using real-time computational planning

Nov 09, 2022

German Espinosa, Gabrielle E. Wink, Alexander T. Lai, Daniel A. Dombeck, Malcolm A. MacIver

Figure 1 for Achieving mouse-level strategic evasion performance using real-time computational planning

Figure 2 for Achieving mouse-level strategic evasion performance using real-time computational planning

Figure 3 for Achieving mouse-level strategic evasion performance using real-time computational planning

Figure 4 for Achieving mouse-level strategic evasion performance using real-time computational planning

Abstract:Planning is an extraordinary ability in which the brain imagines and then enacts evaluated possible futures. Using traditional planning models, computer scientists have attempted to replicate this capacity with some level of success but ultimately face a reoccurring limitation: as the plan grows in steps, the number of different possible futures makes it intractable to determine the right sequence of actions to reach a goal state. Based on prior theoretical work on how the ecology of an animal governs the value of spatial planning, we developed a more efficient biologically-inspired planning algorithm, TLPPO. This algorithm allows us to achieve mouselevel predator evasion performance with orders of magnitude less computation than a widespread algorithm for planning in the situations of partial observability that typify predator-prey interactions. We compared the performance of a real-time agent using TLPPO against the performance of live mice, all tasked with evading a robot predator. We anticipate these results will be helpful to planning algorithm users and developers, as well as to areas of neuroscience where robot-animal interaction can provide a useful approach to studying the basis of complex behaviors.

* 6 pages, 4 figures, ICRA 2023

Via

Access Paper or Ask Questions

Feedback Synthesis For Underactuated Systems Using Sequential Second-Order Needle Variations

Apr 24, 2018

Giorgos Mamakoukas, Malcolm A. MacIver, Todd D. Murphey

Figure 1 for Feedback Synthesis For Underactuated Systems Using Sequential Second-Order Needle Variations

Figure 2 for Feedback Synthesis For Underactuated Systems Using Sequential Second-Order Needle Variations

Figure 3 for Feedback Synthesis For Underactuated Systems Using Sequential Second-Order Needle Variations

Figure 4 for Feedback Synthesis For Underactuated Systems Using Sequential Second-Order Needle Variations

Abstract:This paper derives nonlinear feedback control synthesis for general control affine systems using second-order actions---the second-order needle variations of optimal control---as the basis for choosing each control response to the current state. A second result of the paper is that the method provably exploits the nonlinear controllability of a system by virtue of an explicit dependence of the second-order needle variation on the Lie bracket between vector fields. As a result, each control decision necessarily decreases the objective when the system is nonlinearly controllable using first-order Lie brackets. Simulation results using a differential drive cart, an underactuated kinematic vehicle in three dimensions, and an underactuated dynamic model of an underwater vehicle demonstrate that the method finds control solutions when the first-order analysis is singular. Lastly, the underactuated dynamic underwater vehicle model demonstrates convergence even in the presence of a velocity field.

* 25 pages. arXiv admin note: text overlap with arXiv:1709.01947

Via

Access Paper or Ask Questions

Feedback Synthesis for Controllable Underactuated Systems using Sequential Second Order Actions

Sep 06, 2017

Giorgos Mamakoukas, Malcolm A. MacIver, Todd D. Murphey

Figure 1 for Feedback Synthesis for Controllable Underactuated Systems using Sequential Second Order Actions

Figure 2 for Feedback Synthesis for Controllable Underactuated Systems using Sequential Second Order Actions

Figure 3 for Feedback Synthesis for Controllable Underactuated Systems using Sequential Second Order Actions

Figure 4 for Feedback Synthesis for Controllable Underactuated Systems using Sequential Second Order Actions

Abstract:This paper derives nonlinear feedback control synthesis for general control affine systems using second-order actions---the needle variations of optimal control---as the basis for choosing each control response to the current state. A second result of the paper is that the method provably exploits the nonlinear controllability of a system by virtue of an explicit dependence of the second-order needle variation on the Lie bracket between vector fields. As a result, each control decision necessarily decreases the objective when the system is nonlinearly controllable using first-order Lie brackets. Simulation results using a differential drive cart, an underactuated kinematic vehicle in three dimensions, and an underactuated dynamic model of an underwater vehicle demonstrate that the method finds control solutions when the first-order analysis is singular. Moreover, the simulated examples demonstrate superior convergence when compared to synthesis based on first-order needle variations. Lastly, the underactuated dynamic underwater vehicle model demonstrates the convergence even in the presence of a velocity field.

* Robotics: Science and Systems Proceedings, 2017
* 9 pages

Via

Access Paper or Ask Questions

Ergodic Exploration of Distributed Information

Aug 30, 2017

Lauren M. Miller, Yonatan Silverman, Malcolm A. MacIver, Todd D. Murphey

Figure 1 for Ergodic Exploration of Distributed Information

Figure 2 for Ergodic Exploration of Distributed Information

Figure 3 for Ergodic Exploration of Distributed Information

Figure 4 for Ergodic Exploration of Distributed Information

Abstract:This paper presents an active search trajectory synthesis technique for autonomous mobile robots with nonlinear measurements and dynamics. The presented approach uses the ergodicity of a planned trajectory with respect to an expected information density map to close the loop during search. The ergodic control algorithm does not rely on discretization of the search or action spaces, and is well posed for coverage with respect to the expected information density whether the information is diffuse or localized, thus trading off between exploration and exploitation in a single objective function. As a demonstration, we use a robotic electrolocation platform to estimate location and size parameters describing static targets in an underwater environment. Our results demonstrate that the ergodic exploration of distributed information (EEDI) algorithm outperforms commonly used information-oriented controllers, particularly when distractions are present.

* IEEE Transactions on Robotics, vol. 32, no. 1, pp. 36-52, 2016
* 17 pages

Via

Access Paper or Ask Questions