Medical imaging interpretation is a foundational pillar of modern clinical diagnostics, yet the manual generation of radiology reports remains a time-consuming process prone to interpretation inconsistencies. Within the field of medical AI, automating these descriptions through deep learning promises to streamline clinical workflows and standardise diagnostic output. However, accurate disease detection and precise report generation remain significant challenges due to limitations in capturing fine-grained visual features and ensuring clinical coherence. To address these issues, we propose RL-ACRGNet, an improved encoder-decoder model that integrates a pre-trained DenseNet encoder with a multilevel LSTM decoder within an off-policy reinforcement learning framework. Using a dual-network approach to refine visual-semantic embeddings through a metric-based reward mechanism, we demonstrate that RL-ACRGNet consistently outperforms state-of-the-art baselines on the IU-Xray dataset, achieving quantitative improvements in BLEU-4 (0.47%), METEOR (0.17%) and ROUGE-L (0.518). Furthermore, comprehensive evaluations on the large-scale MIMIC-CXR data set confirm the robust generalisation of the model and its ability to generate high-quality, clinically relevant reports
Epilepsy is one of the most common neurological disorders globally, characterized by recurring seizures and significantly impacting the quality of life. Despite advancements in diagnostic techniques, the mitigation of risks faced by epilepsy patients remains challenging due to the unpredictability of seizure events. An accurate forecast of seizure onset helps to reduce risks in epilepsy patients. In this paper, we propose EEG-FuseFormer, a transformer-based feature fusion framework for seizure-onset prediction that combines intermediate features extracted from Convolutional Neural Networks-Long Short-Term Memory (CNN-LSTM) and ResNet-18 networks. The CNN-LSTM architecture captures both spatial and temporal features directly from the raw signal, whereas the ResNet-18 extracts features from the Short-Time Fourier Transform (STFT) representation of the EEG signals. Fusion is carried out using a transformer encoder, and the final prediction is generated using fully connected dense layers. The CHB-MIT dataset was used to validate the proposed model. The results show that the proposed model achieves a mean recall of 98.85% and outperforms most of the state-of-the-art methods. This study evaluates the ability of the proposed feature fusion model to generalize in cross-patient testing scenarios. Fine-tuning pre-trained models on limited target patient data (target adaptation) within the cross-patient validation framework results in higher recall, precision, and F1-score metrics in comparison to the conventional cross-patient validation approach. Finally, the runtime-based computational complexity of the model is assessed across diverse hardware platforms to highlight the performance-complexity trade-off.
The main goal of active finetuning is to improve a pretrained model's performance on a specific task or domain by finetuning it with carefully selected informative or challenging data. Previous research has predominantly focused on the active aspect (i.e., data selection) while uniformly employing full finetuning for model adaptation, which inevitably distorts pretrained features due to distribution shift. This issue becomes particularly pronounced when the model size is large relative to the finetuning data quantity, leading to heightened overfitting risks. To address this critical gap, we formally outline the FiAF task that emphasizes systematic exploration of finetuning methodologies in active learning. We propose FACT, a three-phase hierarchical finetuning framework featuring both efficiency and simplicity, specifically designed for active finetuning scenarios. Our comprehensive experiments span: (1) Three major dataset categories encompassing classic (CIFAR10, CIFAR100, ImageNet-1k), imbalanced (CIFAR10-LT, CIFAR100-LT), and fine-grained (StanfordCars, FGVCAircraft) image classification datasets, each evaluated under 3-5 distinct sampling ratios; (2) Diverse pretrained architectures including Convolutional Neural Network (ConvNeXt), Vision Transformer (ViT), and Vision LSTM (ViL) networks; (3) A systematic investigation of frozen feature augmentation (FroFA) strategies. (4) A comprehensive and rigorous analysis of efficiency and generalizability. The results demonstrate significant improvements with strong generalization and robustness. Notably, under low sampling ratios, our framework achieves remarkable performance gains of over 20% on the ViT model for CIFAR10, CIFAR100, and ImageNet-1k benchmarks. This systematic approach establishes new state-of-the-art performance while maintaining parameter efficiency, proving particularly effective when labeled data is scarce.
Modern network intrusion detection systems (NIDS) are caught in a structural contradiction: the protocols carrying the highest threat intelligence are precisely those encrypted under TLS 1.3 and QUIC, where payload inspection yields nothing. We ask a simpler question -- what if the attack signature is not in the bytes, but in the rhythm? -- and answer it by treating network flows as a language whose grammar is written entirely in L3/L4 packet metadata: length, inter-arrival time, TTL, TCP flags, and hashed port numbers. We present PLM-NIDS, which proves three claims in sequence. (1) The grammar exists and is learnable: a RWKV-4 state-space model trained on 344,232 unlabelled Monday flows achieves a causal LM validation loss of 0.204, demonstrating that benign traffic has predictable, statistically consistent structure. (2) Attacks violate this grammar: the per-flow perplexity score cleanly separates benign from attack flows with PR-AUC = 0.93 using zero attack labels at training time. (3) This separation is architecturally nontrivial: an LSTM trained on identical token sequences degenerates to a majority-class predictor (ROC-AUC approximately 0.50, F1 = 0.91 by always predicting "attack"), proving that RWKV's causal pre-training provides an inductive bias unavailable to direct classifiers. Supervised fine-tuning further raises PR-AUC to 0.94 and ROC-AUC to 0.75, with a precision of 97.7% at the calibrated operating threshold. The RWKV backbone's O(T) recurrent inference enables per-packet streaming without flow buffering, making PLM-NIDS operationally viable at line rate. Because it reads only IP/TCP/UDP headers, it is inherently encryption-agnostic: TLS 1.3, QUIC, and future encrypted protocols are handled transparently.
This study presents a novel hybrid prognostic framework for uncertainty-aware Remaining Useful Life (RUL) estimation in turbofan engines using the NASA C-MAPSS dataset. The framework employs a state-aware strategy that bifurcates the engines operational lifespan into "healthy" and "degraded" regimes. An LSTM-based autoencoder, trained strictly on nominal data (RUL > 150 cycles), monitors reconstruction error to act as a robust state classifier. For the healthy regime, a Conditional Weibull Survival Analysis is used for Mean Residual Life estimation. For the degraded regime, a Probabilistic Neural Network with Monte Carlo Dropout captures both aleatoric and epistemic uncertainties. Rather than using rigid binary labels, a calibrated sigmoid function converts the autoencoders output into continuous state probabilities, dynamically weighting the final ensemble prediction. The primary strength of this framework is its generation of physically consistent uncertainty bands, yielding high-confidence predictions near end-of-life while accurately reflecting the inherent variance of early operation, providing a robust tool for risk-informed maintenance.
We are witnessing a surge in observations of the cosmic dawn (CD) and epoch of reionisation (EoR), driving an increasing demand for fast and robust theoretical interpretation frameworks. In response, machine learning (ML), and emulation in particular, has emerged as a powerful approach to accelerate and enhance inference pipelines. In this work, we present 21cmEMUv3, an emulator trained on 21cmFASTv3 simulations that model both atomically and molecularly cooling galaxies. 21cmEMUv3 is conditioned on $σ_8$ and ten astrophysical parameters to produce seven summary observables: (i) the cylindrical 21cm power spectrum (PS), emulated for the first time at such high resolution and accuracy across a wide redshift range of $z \sim$ 6--30; (ii) the spherically-averaged 21cm PS; (iii) the mean neutral fraction of the intergalactic medium (IGM); (iv) the mean 21cm spin temperature; (v) the global 21cm signal; (vi) the ultraviolet (UV) luminosity functions (LFs); and (vii) the Thomson scattering optical depth. Notably, the cylindrical 21cm PS is emulated via score-based diffusion, while the remaining six summaries are emulated via long-short term memory (LSTM) networks, all achieving sub-percent median accuracy. We use the emulator to reinterpret current 21cm PS upper limits from HERA, for the first time using state-of-the-art hydrodynamical simulations to inform priors on star formation inside molecularly cooling galaxies. We find that our inferred soft-band X-ray luminosity per unit star formation rate is consistent with extrapolations of high-mass X-ray binaries to the low-metallicity regimes expected in the first galaxies, excluding values below $10^{39.2}$ erg s$^{-1}M^{-1}_\odot \rm{yr}$ at $95\%$ confidence. Finally, we produce forecasts for the detection of the cosmic 21cm PS with the Square Kilometre Array for different array configurations. The 21cmEMU package is publicly available.
Alarm fatigue in intensive care units (ICUs) is a well documented patient safety crisis. Clinical monitors generate 350 or more alarms per patient per day, out of which 72-99% are clinically irrelevant. Staff desensitization to non-actionable alarms increases the risk of missed true emergencies. This paper presents SigmaMedStat, a machine learning system that evaluates the trustworthiness of physiological alarm signals before clinical action is taken. Four approaches were evaluated on the PhysioNet/Computing in Cardiology Challenge 2015 dataset of 498 four-channel ICU alarm recordings. Primary contribution is a temporal modeling framework that splits each 60 second recording into six consecutive 10-second chunks, and this in turn generates Continuous Wavelet Transform (CWT) scalograms per chunk, encodes each chunk with a shared EfficientNet-B0 encoder, and passes the resulting feature sequence to a two-layer Long Short-Term Memory (LSTM) network. Five-fold stratified cross-validation yields a mean AUC of 0.822 +/- 0.016 (95% CI: [0.790,0.853]), compared to 0.641 for a static EfficientNet baseline trained on the full 60-second window. Ablation studies confirm that temporal chunking and multi-channel signal fusion both contribute independently to classification performance. Per-alarm type analysis reveals that Ventricular Flutter is the most accurately classified alarm type (AUC 0.820) while Asystole remains the hardest (AUC 0.722). Error analysis identifies 65 false negatives and 85 high-confidence misclassifications as the primary failure modes. All code and results are publicly available at https://github.com/Arun-K-Ram/sigmamedstat.
Continuous brain-computer interfaces (BCIs) that decode motion trajectories from imagined movement offer intuitive motor control, yet how feedback modality and longitudinal training shape neural representations and decoding performance remains poorly understood. We present the first systematic investigation of embodied virtual reality (VR) feedback during real-time 3D virtual limb control driven by motor imagery, across ten longitudinal sessions in ten participants. Performance was evaluated using three strategies: actual online performance (Fixed Decoder Generalisation, FDG), periodic retraining (Sequential Adaptive Training, SAT), and within-session upper-bound estimation (Within-Session Reconstruction, WSR). A CNN-LSTM decoder achieved within-session imagined movement correlations of r = 0.762 under VR and r = 0.672 under screen feedback. VR significantly outperformed screen feedback across all strategies and movement dimensions (improvements of 8.9-13.0%, all p <= 0.002, d = 1.42-2.05). This advantage persisted under fixed decoders without retraining, demonstrating that embodied VR feedback elicits inherently more decodable and generalisable neural representations. Linear mixed-effects modelling confirmed robust main effects of feedback modality and movement axis with no interaction. Neurophysiologically, VR produced stronger sensorimotor-parietal desynchronisation and enhanced motor-frontal functional connectivity, with pervasive anterior insula engagement across all frequency bands and increased superior parietal lobule coupling, paralleling patterns associated with real movement execution. These findings establish embodied spatial feedback as a key design principle for next-generation continuous BCIs targeting intuitive motor control and neurorehabilitation.
The proliferation of AI-generated synthetic media poses a critical threat to the integrity of digital evidence in legal and forensic contexts. Existing deepfake detection systems typically address a single modality and provide no mechanism for tamper-proof evidence preservation. We present DeepFake Forensics AI, a unified platform that detects synthetic media across image, video, and audio modalities, identifies generative architecture fingerprints, and anchors forensic evidence immutably on the Ethereum blockchain. Our system trains four independent neural networks from scratch: an EfficientNet-B4 image detector (AUC = 0.9868), a Bidirectional LSTM video detector (AUC= 0.9628), an ECAPA-TDNN audio detector (EER = 18.63%), and a novel GAN fingerprinting module (accuracy = 99.88%) that identifies the generative architecture behind a fake image. Evidence files are hashed with SHA-256, stored on IPFS via Pinata, and registered on-chain via a Solidity smart contract with role-based access control. The platform provides a React frontend and FastAPI backend suitable for deployment in forensic and legal workflows. To our knowledge, this is the first system to unify multi-modal deepfake detection with blockchain-based chain-of custody management.
We present a unified experiment, analysis, and benchmark study of multivariate time-series (MTS) anomaly detection. Ten family-representative detectors -- spanning statistical, reconstruction, association, frequency, and generic-transformer families -- are evaluated on five datasets (SMD, MSL, SMAP, PSM, and MSDS) under effectiveness, efficiency, robustness, and cross-dataset generalisation. All methods share the same windowing, scoring, hardware, and metric protocols. Effectiveness, ablation, and robustness use three random seeds; cross-dataset transfer uses seed~0 because each extra seed requires $250$ source-target evaluations. The benchmark yields three method-independent findings: no single-bias baseline dominates; absolute perturbation VUS-ROC is more informative than retention ratios; and MSDS behaves as an event-dense deployment workload rather than a sparse point-anomaly benchmark. Under this protocol we also introduce \ours{}, an adaptive detector family combining a NOTEARS-constrained directed channel-graph view with optional patch-attention and temporal-association views. \ours{} achieves the best macro-average VUS-ROC ($0.675$, $+5.1$~pt over the second-best LSTM-AE), ranks first overall, and reaches the top-3 on all five datasets. Its wins on MSL and MSDS are narrow, while its average and robustness gains are larger: under the same three-seed robustness protocol for every method, it obtains the strongest absolute VUS-ROC across noise, channel dropout, and time-shift perturbations. We release the MSDS preprocessing protocol, configurations, scripts, and seed-level metric dumps.