Abstract:Continual Reinforcement Learning (CRL) aims to develop lifelong learning agents to continuously acquire knowledge across diverse tasks while mitigating catastrophic forgetting. This requires efficiently managing the stability-plasticity dilemma and leveraging prior experience to rapidly generalize to novel tasks. While various enhancement strategies for both aspects have been proposed, achieving scalable performance by directly applying RL to sequential task streams remains challenging. In this paper, we propose a novel teacher-student framework that decouples CRL into two independent processes: training single-task teacher models through distributed RL and continually distilling them into a central generalist model. This design is motivated by the observation that RL excels at solving single tasks, while policy distillation -- a relatively stable supervised learning process -- is well aligned with large foundation models and multi-task learning. Moreover, a mixture-of-experts (MoE) architecture and a replay-based approach are employed to enhance the plasticity and stability of the continual policy distillation process. Extensive experiments on the Meta-World benchmark demonstrate that our framework enables efficient continual RL, recovering over 85% of teacher performance while constraining task-wise forgetting to within 10%.


Abstract:We present DIY-IPS - Do It Yourself - Indoor Positioning System, an open-source real-time indoor positioning mobile application. DIY-IPS detects users' indoor position by employing dual-band RSSI fingerprinting of available WiFi access points. The app can be used, without additional infrastructural costs, to detect users' indoor positions in real time. We published our app as an open source to save other researchers time recreating it. The app enables researchers/users to (1) collect indoor positioning datasets with a ground truth label, (2) customize the app for higher accuracy or other research purposes (3) test the accuracy of modified methods by live testing with ground truth. We ran preliminary experiments to demonstrate the effectiveness of the app.