Abstract:Continual Federated Learning (CFL) is essential for enabling real-world applications where multiple decentralized clients adaptively learn from continuous data streams. A significant challenge in CFL is mitigating catastrophic forgetting, where models lose previously acquired knowledge when learning new information. Existing approaches often face difficulties due to the constraints of device storage capacities and the heterogeneous nature of data distributions among clients. While some CFL algorithms have addressed these challenges, they frequently rely on unrealistic assumptions about the availability of task boundaries (i.e., knowing when new tasks begin). To address these limitations, we introduce Fed-A-GEM, a federated adaptation of the A-GEM method (Chaudhry et al., 2019), which employs a buffer-based gradient projection approach. Fed-A-GEM alleviates catastrophic forgetting by leveraging local buffer samples and aggregated buffer gradients, thus preserving knowledge across multiple clients. Our method is combined with existing CFL techniques, enhancing their performance in the CFL context. Our experiments on standard benchmarks show consistent performance improvements across diverse scenarios. For example, in a task-incremental learning scenario using the CIFAR-100 dataset, our method can increase the accuracy by up to 27%. Our code is available at https://github.com/shenghongdai/Fed-A-GEM.
Abstract:Sensing technology is widely used for comprehending the physical world, with numerous modalities explored in past decades. While there has been considerable work on multi-modality learning, they all require data of all modalities be paired. How to leverage multi-modality data with partially pairings remains an open problem. To tackle this challenge, we introduce the Babel framework, encompassing the neural network architecture, data preparation and processing, as well as the training strategies. Babel serves as a scalable pre-trained multi-modal sensing neural network, currently aligning six sensing modalities, namely Wi-Fi, mmWave, IMU, LiDAR, video, and depth. To overcome the scarcity of complete paired data, the key idea of Babel involves transforming the N-modality alignment into a series of two-modality alignments by devising the expandable network architecture. This concept is also realized via a series of novel techniques, including the pre-trained modality tower that capitalizes on available single-modal networks, and the adaptive training strategy balancing the contribution of the newly incorporated modality with the previously established modality alignment. Evaluation demonstrates Babel's outstanding performance on eight human activity recognition datasets, compared to various baselines e.g., the top multi-modal sensing framework, single-modal sensing networks, and multi-modal large language models. Babel not only effectively fuses multiple available modalities (up to 22% accuracy increase), but also enhance the performance of individual modality (12% averaged accuracy improvement). Case studies also highlight exciting application scenarios empowered by Babel, including cross-modality retrieval (i.e., sensing imaging), and bridging LLM for sensing comprehension.