Abstract:Many organisms and cell types, from bacteria to cancer cells, exhibit a remarkable ability to adapt to fluctuating environments. Additionally, cells can leverage memory of past environments to better survive previously-encountered stressors. From a control perspective, this adaptability poses significant challenges in driving cell populations toward extinction, and is thus an open question with great clinical significance. In this work, we focus on drug dosing in cell populations exhibiting phenotypic plasticity. For specific dynamical models switching between resistant and susceptible states, exact solutions are known. However, when the underlying system parameters are unknown, and for complex memory-based systems, obtaining the optimal solution is currently intractable. To address this challenge, we apply reinforcement learning (RL) to identify informed dosing strategies to control cell populations evolving under novel non-Markovian dynamics. We find that model-free deep RL is able to recover exact solutions and control cell populations even in the presence of long-range temporal dynamics.
Abstract:An agent's ability to leverage past experience is critical for efficiently solving new tasks. Prior work has focused on using value function estimates to obtain zero-shot approximations for solutions to a new task. In soft Q-learning, we show how any value function estimate can also be used to derive double-sided bounds on the optimal value function. The derived bounds lead to new approaches for boosting training performance which we validate experimentally. Notably, we find that the proposed framework suggests an alternative method for updating the Q-function, leading to boosted performance.
Abstract:In the field of reinforcement learning (RL), agents are often tasked with solving a variety of problems differing only in their reward functions. In order to quickly obtain solutions to unseen problems with new reward functions, a popular approach involves functional composition of previously solved tasks. However, previous work using such functional composition has primarily focused on specific instances of composition functions whose limiting assumptions allow for exact zero-shot composition. Our work unifies these examples and provides a more general framework for compositionality in both standard and entropy-regularized RL. We find that, for a broad class of functions, the optimal solution for the composite task of interest can be related to the known primitive task solutions. Specifically, we present double-sided inequalities relating the optimal composite value function to the value functions for the primitive tasks. We also show that the regret of using a zero-shot policy can be bounded for this class of functions. The derived bounds can be used to develop clipping approaches for reducing uncertainty during training, allowing agents to quickly adapt to new tasks.
Abstract:An agent's ability to reuse solutions to previously solved problems is critical for learning new tasks efficiently. Recent research using composition of value functions in reinforcement learning has shown that agents can utilize solutions of primitive tasks to obtain solutions for exponentially many new tasks. However, previous work has relied on restrictive assumptions on the dynamics, the method of composition, and the structure of reward functions. Here we consider the case of general composition functions without any restrictions on the structure of reward functions, applicable to both deterministic and stochastic dynamics. For this general setup, we provide bounds on the corresponding optimal value functions and characterize the value of corresponding policies. The theoretical results derived lead to improvements in training for both entropy-regularized and standard reinforcement learning, which we validate with numerical simulations.
Abstract:In reinforcement learning (RL), the ability to utilize prior knowledge from previously solved tasks can allow agents to quickly solve new problems. In some cases, these new problems may be approximately solved by composing the solutions of previously solved primitive tasks (task composition). Otherwise, prior knowledge can be used to adjust the reward function for a new problem, in a way that leaves the optimal policy unchanged but enables quicker learning (reward shaping). In this work, we develop a general framework for reward shaping and task composition in entropy-regularized RL. To do so, we derive an exact relation connecting the optimal soft value functions for two entropy-regularized RL problems with different reward functions and dynamics. We show how the derived relation leads to a general result for reward shaping in entropy-regularized RL. We then generalize this approach to derive an exact relation connecting optimal value functions for the composition of multiple tasks in entropy-regularized RL. We validate these theoretical contributions with experiments showing that reward shaping and task composition lead to faster learning in various settings.
Abstract:This report discusses the application of neural networks (NNs) as small segments of the brain. The networks representing the biological connectome are altered both spatially and temporally. The degradation techniques applied here are "weight degradation", "weight scrambling", and variable activation function. These methods aim to shine light on the study of neurodegenerative diseases such as Alzheimer's, Huntington's and Parkinson's disease as well as strokes and brain tumors disrupting the flow of information in the brain's network. Fundamental insights to memory loss and generalized learning dysfunction are gained by monitoring the network's error function during network degradation. The biological significance of each facet is also discussed.