Abstract:Accurate interpretation of Electrocardiogram (ECG) signals is pivotal for diagnosing cardiovascular diseases. Integrating ECG signals with their accompanying textual reports holds immense potential to enhance clinical diagnostics through the combination of physiological data and qualitative insights. However, this integration faces significant challenges due to inherent modality disparities and the scarcity of labeled data for robust cross-modal learning. To address these obstacles, we propose C-MELT, a novel framework that pre-trains ECG and text data using a contrastive masked auto-encoder architecture. C-MELT uniquely combines the strengths of generative with enhanced discriminative capabilities to achieve robust cross-modal representations. This is accomplished through masked modality modeling, specialized loss functions, and an improved negative sampling strategy tailored for cross-modal alignment. Extensive experiments on five public datasets across diverse downstream tasks demonstrate that C-MELT significantly outperforms existing methods, achieving 15% and 2% increases in linear probing and zero-shot performance over state-of-the-art models, respectively. These results highlight the effectiveness of C-MELT, underscoring its potential to advance automated clinical diagnostics through multi-modal representations.
Abstract:Respiratory rate (RR) monitoring is integral to understanding physical and mental health and tracking fitness. Existing studies have demonstrated the feasibility of RR monitoring under specific user conditions (e.g., while remaining still, or while breathing heavily). Yet, performing accurate, continuous and non-obtrusive RR monitoring across diverse daily routines and activities remains challenging. In this work, we present RespEar, an earable-based system for robust RR monitoring. By leveraging the unique properties of in-ear microphones in earbuds, RespEar enables the use of Respiratory Sinus Arrhythmia (RSA) and Locomotor Respiratory Coupling (LRC), physiological couplings between cardiovascular activity, gait and respiration, to indirectly determine RR. This effectively addresses the challenges posed by the almost imperceptible breathing signals under daily activities. We further propose a suite of meticulously crafted signal processing schemes to improve RR estimation accuracy and robustness. With data collected from 18 subjects over 8 activities, RespEar measures RR with a mean absolute error (MAE) of 1.48 breaths per minutes (BPM) and a mean absolute percent error (MAPE) of 9.12% in sedentary conditions, and a MAE of 2.28 BPM and a MAPE of 11.04% in active conditions, respectively, which is unprecedented for a method capable of generalizing across conditions with a single modality.
Abstract:Enabling efficient and accurate deep neural network (DNN) inference on microcontrollers is non-trivial due to the constrained on-chip resources. Current methodologies primarily focus on compressing larger models yet at the expense of model accuracy. In this paper, we rethink the problem from the inverse perspective by constructing small/weak models directly and improving their accuracy. Thus, we introduce DiTMoS, a novel DNN training and inference framework with a selector-classifiers architecture, where the selector routes each input sample to the appropriate classifier for classification. DiTMoS is grounded on a key insight: a composition of weak models can exhibit high diversity and the union of them can significantly boost the accuracy upper bound. To approach the upper bound, DiTMoS introduces three strategies including diverse training data splitting to increase the classifiers' diversity, adversarial selector-classifiers training to ensure synergistic interactions thereby maximizing their complementarity, and heterogeneous feature aggregation to improve the capacity of classifiers. We further propose a network slicing technique to alleviate the extra memory overhead incurred by feature aggregation. We deploy DiTMoS on the Neucleo STM32F767ZI board and evaluate it based on three time-series datasets for human activity recognition, keywords spotting, and emotion recognition, respectively. The experiment results manifest that: (a) DiTMoS achieves up to 13.4% accuracy improvement compared to the best baseline; (b) network slicing almost completely eliminates the memory overhead incurred by feature aggregation with a marginal increase of latency.
Abstract:Traditional machine learning techniques are prone to generating inaccurate predictions when confronted with shifts in the distribution of data between the training and testing phases. This vulnerability can lead to severe consequences, especially in applications such as mobile healthcare. Uncertainty estimation has the potential to mitigate this issue by assessing the reliability of a model's output. However, existing uncertainty estimation techniques often require substantial computational resources and memory, making them impractical for implementation on microcontrollers (MCUs). This limitation hinders the feasibility of many important on-device wearable event detection (WED) applications, such as heart attack detection. In this paper, we present UR2M, a novel Uncertainty and Resource-aware event detection framework for MCUs. Specifically, we (i) develop an uncertainty-aware WED based on evidential theory for accurate event detection and reliable uncertainty estimation; (ii) introduce a cascade ML framework to achieve efficient model inference via early exits, by sharing shallower model layers among different event models; (iii) optimize the deployment of the model and MCU library for system efficiency. We conducted extensive experiments and compared UR2M to traditional uncertainty baselines using three wearable datasets. Our results demonstrate that UR2M achieves up to 864% faster inference speed, 857% energy-saving for uncertainty estimation, 55% memory saving on two popular MCUs, and a 22% improvement in uncertainty quantification performance. UR2M can be deployed on a wide range of MCUs, significantly expanding real-time and reliable WED applications.
Abstract:In this article, analog microwave short-time Fourier transform (STFT) with improved measurement performance is implemented in the optical domain by employing stimulated Brillouin scattering (SBS) and channelization. By jointly using three optical frequency combs and filter- and SBS-based frequency-to-time mapping (FTTM), the time-frequency information of the signal under test (SUT) in different frequency intervals is measured in different channels. Then, by using the channel label introduced through subcarriers after photodetection, the obtained low-speed electrical pulses in different channels mixed in the time domain are distinguished and the time-frequency information of the SUT in different channels is respectively obtained and spliced to implement the STFT. For the first time, channelization measurement technology is introduced in the STFT system based on frequency sweeping and FTTM, greatly reducing the frequency-sweep range of the required frequency-sweep signal to the analysis bandwidth divided by the number of channels. In addition, channelization can also be used to improve the time and frequency resolution of the STFT system. A proof-of-concept experiment is performed. 12-GHz and 10-GHz analysis bandwidth is implemented by using a 4-GHz frequency-sweep signal and 3 channels and a 2-GHz frequency-sweep signal and 5 channels. Measurement performance improvement is also demonstrated.
Abstract:A photonics-enabled wavelet-like transform system, characterized by multi-resolution time-frequency analysis, is proposed based on a typical stimulated Brillouin scattering (SBS) pump-probe setup using an optical nonlinear frequency-sweep signal. In the pump path, a continuous-wave optical signal is injected into an SBS medium to generate an SBS gain. In the probe path, a periodic nonlinear frequency-sweep optical signal with a time-varying chirp rate is generated, which is then modulated at a Mach-Zehnder modulator (MZM) by the electrical signal under test (SUT). The optical signal from the MZM is selectively amplified by the SBS gain and converted back to the electrical domain using a low-speed photodetector, implementing the periodic SBS-based frequency-to-time mapping (FTTM). The frequency-domain information corresponding to different periods is mapped to the time domain via the FTTM in the form of low-speed electrical pulses, which is then spliced to analyze the time-frequency relationship of the SUT in real-time. The time-varying chirp rate in each sweep period makes the signals with different frequencies have different frequency resolutions in the FTTM process, which is very similar to the characteristics of the wavelet transform, so we call it wavelet-like transform. An experiment is carried out. Multi-resolution time-frequency analysis of a variety of RF signals is carried out in a 4-GHz bandwidth limited only by the equipment.
Abstract:In this paper, the filter- and frequency-to-time mapping (FTTM)-based photonics-assisted time and frequency acquisition methods are comprehensively analyzed and the accuracy and resolution limitation in the fast sweep scenario is broken by broadening the filter bandwidth. It is found that when the sweep speed is very fast, the width of the generated pulse via FTTM is mainly determined by the impulse response of the filter. In this case, appropriately increasing the filter bandwidth can significantly reduce the pulse width, so as to improve the measurement accuracy and resolution. FTTM-based short-time Fourier transform (STFT) and microwave frequency measurement using the stimulated Brillouin scattering (SBS) effect is demonstrated by comparing the results with and without SBS gain spectrum broadening and the improvement of measurement accuracy and frequency resolution is well confirmed. The frequency measurement accuracy of the system is improved by around 25 times compared with the former work using a similar sweep speed, while the frequency resolution of the STFT is also much improved compared with our former results.
Abstract:A photonics-based short-time Fourier transform (STFT) system is proposed and experimentally demonstrated based on stimulated Brillouin scattering (SBS) without using high-frequency electronic devices and equipment. The wavelength of a distributed feedback laser diode is periodically swept by using a low-speed periodic sawtooth/triangular driving current. The periodic frequency-sweep optical signal is modulated by the signal under test (SUT) and then injected into a section of SBS medium. The optical signal from another laser diode as the pump wave is reversely injected into the SBS medium. After simply detecting the forward transmission optical signals in a low-speed photodetector, the STFT of the SUT can be implemented. The system is characterized by the absence of any high-frequency electronic devices or equipment. An experiment is performed. The STFT of a variety of RF signals is carried out in a 4-GHz bandwidth. The dynamic frequency resolution is demonstrated to be around 60 MHz.
Abstract:Many deep learning applications, like keyword spotting, require the incorporation of new concepts (classes) over time, referred to as Class Incremental Learning (CIL). The major challenge in CIL is catastrophic forgetting, i.e., preserving as much of the old knowledge as possible while learning new tasks. Various techniques, such as regularization, knowledge distillation, and the use of exemplars, have been proposed to resolve this issue. However, prior works primarily focus on the incremental learning step, while ignoring the optimization during the base model training. We hypothesize that a more transferable and generalizable feature representation from the base model would be beneficial to incremental learning. In this work, we adopt multitask learning during base model training to improve the feature generalizability. Specifically, instead of training a single model with all the base classes, we decompose the base classes into multiple subsets and regard each of them as a task. These tasks are trained concurrently and a shared feature extractor is obtained for incremental learning. We evaluate our approach on two datasets under various configurations. The results show that our approach enhances the average incremental learning accuracy by up to 5.5%, which enables more reliable and accurate keyword spotting over time. Moreover, the proposed approach can be combined with many existing techniques and provides additional performance gain.
Abstract:A time-varying microwave photonic filter (TV-MPF) based on stimulated Brillouin scattering (SBS) is proposed and utilized to suppress the in-band noise of broadband arbitrary microwave waveforms, thereby improving the signal-to-noise ratio (SNR). The filter-controlling signal is designed according to the signal to be filtered and drives the TV-MPF so that the passband of the filter is always aligned with the frequencies of the signal to be filtered. By continuously tracking the signal spectral component, the TV-MPF only retains the spectral components of the signal and filters out the noise other than the spectral component of the signal at the current time, so as to improve the in-band SNR of the signal to be filtered. An experiment is performed. A variety of signals with different formats and in-band SNRs are used to test the noise suppression capability of the TV-MPF, and the waveform mean-square error is calculated to quantify the improvement of the signal, demonstrating the excellent adaptability of the proposed TV-MPF to different kinds of signals.