Abstract:Accurate fMRI analysis requires sensitivity to temporal structure across multiple scales, as BOLD signals encode cognitive processes that emerge from fast transient dynamics to slower, large-scale fluctuations. Existing deep learning (DL) approaches to temporal modeling face challenges in jointly capturing these dynamics over long fMRI time series. Among current DL models, transformers address long-range dependencies by explicitly modeling pairwise interactions through attention, but the associated quadratic computational cost limits effective integration of temporal dependencies across long fMRI sequences. Selective state-space models (SSMs) instead model long-range temporal dependencies implicitly through latent state evolution in a dynamical system, enabling efficient propagation of dependencies over time. However, recent SSM-based approaches for fMRI commonly operate on derived functional connectivity representations and employ single-scale temporal processing. These design choices constrain the ability to jointly represent fast transient dynamics and slower global trends within a single model. We propose NeuroSSM, a selective state-space architecture designed for end-to-end analysis of raw BOLD signals in fMRI time series. NeuroSSM addresses the above limitations through two complementary design components: a multiscale state-space backbone that captures fast and slow dynamics concurrently, and a parallel differencing branch that increases sensitivity to transient state changes. Experiments on clinical and non-clinical datasets demonstrate that NeuroSSM achieves competitive performance and efficiency against state-of-the-art fMRI analysis methods.
Abstract:Medical image reconstruction from undersampled acquisitions is an ill-posed problem that involves inversion of the imaging operator linking measurement and image domains. In recent years, physics-driven (PD) models have gained prominence in learning-based reconstruction given their enhanced balance between efficiency and performance. For reconstruction, PD models cascade data-consistency modules that enforce fidelity to acquired data based on the imaging operator, with network modules that process feature maps to alleviate image artifacts due to undersampling. Success in artifact suppression inevitably depends on the ability of the network modules to tease apart artifacts from underlying tissue structures, both of which can manifest contextual relations over broad spatial scales. Convolutional modules that excel at capturing local correlations are relatively insensitive to non-local context. While transformers promise elevated sensitivity to non-local context, practical implementations often suffer from a suboptimal trade-off between local and non-local sensitivity due to intrinsic model complexity. Here, we introduce a novel physics-driven autoregressive state space model (MambaRoll) for enhanced fidelity in medical image reconstruction. In each cascade of an unrolled architecture, MambaRoll employs an autoregressive framework based on physics-driven state space modules (PSSM), where PSSMs efficiently aggregate contextual features at a given spatial scale while maintaining fidelity to acquired data, and autoregressive prediction of next-scale feature maps from earlier spatial scales enhance capture of multi-scale contextual features. Demonstrations on accelerated MRI and sparse-view CT reconstructions indicate that MambaRoll outperforms state-of-the-art PD methods based on convolutional, transformer and conventional SSM modules.