While affective computing has advanced considerably, multimodal emotion prediction in aging populations remains underexplored, largely due to the scarcity of dedicated datasets. Existing multimodal benchmarks predominantly target young, cognitively healthy subjects, neglecting the influence of cognitive decline on emotional expression and physiological responses. To bridge this gap, we present MECO, a Multimodal dataset for Emotion and Cognitive understanding in Older adults. MECO includes 42 participants and provides approximately 38 hours of multimodal signals, yielding 30,592 synchronized samples. To maximize ecological validity, data collection followed standardized protocols within community-based settings. The modalities cover video, audio, electroencephalography (EEG), and electrocardiography (ECG). In addition, the dataset offers comprehensive annotations of emotional and cognitive states, including self-assessed valence, arousal, six basic emotions, and Mini-Mental State Examination cognitive scores. We further establish baseline benchmarks for both emotion and cognitive prediction. MECO serves as a foundational resource for multimodal modeling of affect and cognition in aging populations, facilitating downstream applications such as personalized emotion recognition and early detection of mild cognitive impairment (MCI) in real-world settings. The complete dataset and supplementary materials are available at https://maitrechen.github.io/meco-page/.
Electrocardiograms (ECGs) are among the most widely used diagnostic tools for cardiovascular diseases, and a large amount of ECG data worldwide appears only in image form. However, most existing automated ECG analysis methods rely on access to raw signal recordings, limiting their applicability in real-world and resource-constrained settings. In this paper, we present ECG-Scan, a self-supervised framework for learning clinically generalized representations from ECG images through dual physiological-aware alignments: 1) Our approach optimizes image representation learning using multimodal contrastive alignment between image and gold-standard signal-text modalities. 2) We further integrate domain knowledge via soft-lead constraints, regularizing the reconstruction process and improving signal lead inter-consistency. Extensive benchmarking across multiple datasets and downstream tasks demonstrates that our image-based model achieves superior performance compared to existing image baselines and notably narrows the gap between ECG image and signal analysis. These results highlight the potential of self-supervised image modeling to unlock large-scale legacy ECG data and broaden access to automated cardiovascular diagnostics.
Electrocardiogram (ECG) foundation models represent a paradigm shift from task-specific pipelines to generalizable architectures pre-trained on large-scale unlabeled waveform data. This survey presents a unified and deployment-aware review of foundation models and medical large language models (LLMs) for ECG intelligence in cardiovascular disease (CVD) diagnosis, monitoring, and clinical decision support. The central thesis of this survey paper is that next-generation cardiovascular AI systems will be inherently agentic, requiring the synergistic integration of two complementary model classes: (i) ECG foundation models that act as signal-level interpreters, learning rich electrophysiological representations via self-supervised and multimodal pretraining, and (ii) medical LLMs, trained on biomedical text corpora, that function as knowledge-based reasoning backbones for contextual inference, guideline alignment, and clinical decision support. Thus, the survey systematically reviews existing pool of generalist medical LLMs, as well as ECG foundation models that utilize techniques such as self-supervised learning, multimodal ECG-language alignment, vision transformer architectures, and possess capabilities such as zero-shot classification, automated report generation, and longitudinal risk modeling. Recognizing the constraints of consumer-grade wearable edge devices, we further examine model optimization techniques such as quantization, pruning, knowledge distillation, as well as the role of small language models in enabling low-latency, energy-efficient, and privacy-preserving ECG intelligence on edge platforms such as smartwatches. Finally, we outline future directions in multimodal ECG foundation models, agent-driven monitoring, and explainable, secure edge intelligence, with particular emphasis on real-time, on-device cardiovascular analytics in consumer electronics ecosystems.
ECG digitization could unlock billions of archived clinical records, yet existing methods collapse on real-world images despite strong benchmark numbers. We introduce \textbf{VLM-in-the-Loop}, a plug-in quality assurance module that wraps any digitization backend with closed-loop VLM feedback via a standardized interface, requiring no modification to the underlying digitizer. The core mechanism is \textbf{tool grounding}: anchoring VLM assessment in quantitative evidence from domain-specific signal analysis tools. In a controlled ablation on 200 records with paired ground truth, tool grounding raises verdict consistency from 71\% to 89\% and doubles fidelity separation ($Δ$PCC 0.03 $\rightarrow$ 0.08), with the effect replicating across three VLMs (Claude Opus~4, GPT-4o, Gemini~2.5 Pro), confirming a pattern-level rather than model-specific gain. Deployed across four backends, the module improves every one: 29.4\% of borderline leads improved on our pipeline; 41.2\% of failed limb leads recovered on ECG-Digitiser; valid leads per image doubled on Open-ECG-Digitizer (2.5 $\rightarrow$ 5.8). On 428 real clinical HCM images, the integrated system reaches 98.0\% Excellent quality. Both the plug-in architecture and tool-grounding mechanism are domain-parametric, suggesting broader applicability wherever quality criteria are objectively measurable.
Deep learning training is non-deterministic: identical code with different random seeds produces models that agree on aggregate metrics but disagree on individual predictions, with per-class AUC swings exceeding 20 percentage points on rare clinical classes. We present a framework for verified bit-identical training that eliminates three sources of randomness: weight initialization (via structured orthogonal basis functions), batch ordering (via golden ratio scheduling), and non-deterministic GPU operations (via architecture selection and custom autograd). The pipeline produces MD5-verified identical trained weights across independent runs. On PTB-XL ECG rhythm classification, structured initialization significantly exceeds Kaiming across two architectures (n=20; Conformer p = 0.016, Baseline p < 0.001), reducing aggregate variance by 2-3x and reducing per-class variability on rare rhythms by up to 7.5x (TRIGU range: 4.1pp vs 30.9pp under Kaiming, independently confirmed by 3-fold CV). A four-basis comparison at n=20 shows all structured orthogonal bases produce equivalent performance (Friedman p=0.48), establishing that the contribution is deterministic structured initialization itself, not any particular basis function. Cross-domain validation on seven MedMNIST benchmarks (n=20, all p > 0.14) confirms no performance penalty on standard tasks; per-class analysis on imbalanced tasks (ChestMNIST, RetinaMNIST) shows the same variance reduction on rare classes observed in ECG. Cross-dataset evaluation on three external ECG databases confirms zero-shot generalization (>0.93 AFIB AUC).
Low left ventricular ejection fraction (LEF) frequently remains undetected until progression to symptomatic heart failure, underscoring the need for scalable screening strategies. Although artificial intelligence-enabled electrocardiography (AI-ECG) has shown promise, existing approaches rely solely on end-to-end black-box models with limited interpretability or on tabular systems dependent on commercial ECG measurement algorithms with suboptimal performance. We introduced ECG-based Predictor-Driven LEF (ECGPD-LEF), a structured framework that integrates foundation model-derived diagnostic probabilities with interpretable modeling for detecting LEF from ECG. Trained on the benchmark EchoNext dataset comprising 72,475 ECG-echocardiogram pairs and evaluated in predefined independent internal (n=5,442) and external (n=16,017) cohorts, our framework achieved robust discrimination for moderate LEF (internal AUROC 88.4%, F1 64.5%; external AUROC 86.8%, F1 53.6%), consistently outperforming the official end-to-end baseline provided with the benchmark across demographic and clinical subgroups. Interpretability analyses identified high-impact predictors, including normal ECG, incomplete left bundle branch block, and subendocardial injury in anterolateral leads, driving LEF risk estimation. Notably, these predictors independently enabled zero-shot-like inference without task-specific retraining (internal AUROC 75.3-81.0%; external AUROC 71.6-78.6%), indicating that ventricular dysfunction is intrinsically encoded within structured diagnostic probability representations. This framework reconciles predictive performance with mechanistic transparency, supporting scalable enhancement through additional predictors and seamless integration with existing AI-ECG systems.
Foundation models have recently improved electrocardiogram (ECG) representation learning, but their deployment can be limited by computational cost and latency constraints. In this work, we fine-tune ECGFounder as a high-capacity teacher for binary ECG classification on PTB-XL and the MIT-BIH Arrhythmia Database, and investigate whether knowledge distillation can transfer its predictive behavior to compact students. We evaluate two classical 1D students (ResNet-1D and a lightweight CNN-1D) and a quantum-ready pipeline that combines a convolutional autoencoder, which compresses 256-sample ECG windows into a low-dimensional latent representation, with a 6-qubit variational quantum circuit implemented in Qiskit and executed in a simulated backend. Across both datasets, the teacher provides the strongest overall performance, while distillation yields competitive students under a considerable reduction in trainable parameters. We further analyze the sensitivity of student performance to distillation settings, highlighting consistent accuracy--efficiency trade-offs when compressing a foundation ECG model into classical and quantum-ready learners under a unified evaluation protocol.
Electrocardiograms (ECGs) are among the most widely available clinical signals and play a central role in cardiovascular diagnosis. While recent foundation models (FMs) have shown promise for learning transferable ECG representations, most existing pretraining approaches treat leads as independent channels and fail to explicitly leverage their strong structural redundancy. We introduce the latent attention masked autoencoder (LAMAE) FM that directly exploits this structure by learning cross-lead connection mechanisms during self-supervised pretraining. Our approach models higher-order interactions across leads through latent attention, enabling permutation-invariant aggregation and adaptive weighting of lead-specific representations. We provide empirical evidence on the Mimic-IV-ECG database that leveraging the cross-lead connection constitutes an effective form of structural supervision, improving representation quality and transferability. Our method shows strong performance in predicting ICD-10 codes, outperforming independent-lead masked modeling and alignment-based baselines.
Climbing is a multifaceted sport that combines physical demands and emotional and cognitive challenges. Ascent styles differ in fall distance with lead climbing involving larger falls than top rope climbing, which may result in different perceived risk and fear. In this study, we investigated the psychophysiological relationship between perceived fear and muscle activity in climbers using a combination of statistical modeling and deep learning techniques. We conducted an experiment with 19 climbers, collecting electromyography (EMG), electrocardiography (ECG) and arm motion data during lead and top rope climbing. Perceived fear ratings were collected for the different phases of the climb. Using a linear mixed-effects model, we analyzed the relationships between perceived fear and physiological measures. To capture the non-linear dynamics of this relationship, we extended our analysis to deep learning models and integrated random effects for a personalized modeling approach. Our results showed that random effects improved model performance of the mean squared error (MSE), mean absolute error (MAE) and root mean squared error (RMSE). The results showed that muscle fatigue correlates significantly with increased fear during \textit{lead climbing}. This study highlights the potential of combining statistical and deep learning approaches for modeling the interplay between psychological and physiological states during climbing.
Deploying neural networks on microcontrollers is constrained by kilobytes of flash and SRAM, where 1x1 pointwise (PW) mixers often dominate memory even after INT8 quantization across vision, audio, and wearable sensing. We present HYPER-TINYPW, a compression-as-generation approach that replaces most stored PW weights with generated weights: a shared micro-MLP synthesizes PW kernels once at load time from tiny per-layer codes, caches them, and executes them with standard integer operators. This preserves commodity MCU runtimes and adds only a one-off synthesis cost; steady-state latency and energy match INT8 separable CNN baselines. Enforcing a shared latent basis across layers removes cross-layer redundancy, while keeping PW1 in INT8 stabilizes early, morphology-sensitive mixing. We contribute (i) TinyML-faithful packed-byte accounting covering generator, heads/factorization, codes, kept PW1, and backbone; (ii) a unified evaluation with validation-tuned t* and bootstrap confidence intervals; and (iii) a deployability analysis covering integer-only inference and boot versus lazy synthesis. On three ECG benchmarks (Apnea-ECG, PTB-XL, MIT-BIH), HYPER-TINYPW shifts the macro-F1 versus flash Pareto frontier: at about 225 kB it matches a roughly 1.4 MB CNN while being 6.31x smaller (84.15% fewer bytes), retaining at least 95% of large-model macro-F1. Under 32-64 kB budgets it sustains balanced detection where compact baselines degrade. The mechanism applies broadly to other 1D biosignals, on-device speech, and embedded sensing tasks where per-layer redundancy dominates, indicating a wider role for compression-as-generation in resource-constrained ML systems. Beyond ECG, HYPER-TINYPW transfers to TinyML audio: on Speech Commands it reaches 96.2% test accuracy (98.2% best validation), supporting broader applicability to embedded sensing workloads where repeated linear mixers dominate memory.