Driven by the progress in efficient embedded processing, there is an accelerating trend toward running machine learning models directly on wearable Brain-Machine Interfaces (BMIs) to improve portability and privacy and maximize battery life. However, achieving low latency and high classification performance remains challenging due to the inherent variability of electroencephalographic (EEG) signals across sessions and the limited onboard resources. This work proposes a comprehensive BMI workflow based on a CNN-based Continual Learning (CL) framework, allowing the system to adapt to inter-session changes. The workflow is deployed on a wearable, parallel ultra-low power BMI platform (BioGAP). Our results based on two in-house datasets, Dataset A and Dataset B, show that the CL workflow improves average accuracy by up to 30.36% and 10.17%, respectively. Furthermore, when implementing the continual learning on a Parallel Ultra-Low Power (PULP) microcontroller (GAP9), it achieves an energy consumption as low as 0.45mJ per inference and an adaptation time of only 21.5ms, yielding around 25h of battery life with a small 100mAh, 3.7V battery on BioGAP. Our setup, coupled with the compact CNN model and on-device CL capabilities, meets users' needs for improved privacy, reduced latency, and enhanced inter-session performance, offering good promise for smart embedded real-world BMIs.