Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fabio Martínez

A Geometric Multimodal Foundation Model Integrating Bp-MRI and Clinical Reports in Prostate Cancer Classification

Jan 30, 2026

Juan A. Olmos, Antoine Manzanera, Fabio Martínez

Abstract:Prostate cancer (PCa) is one of the most common cancers in men worldwide. Bi-parametric MRI (bp-MRI) and clinical variables are crucial for PCa identification and improving treatment decisions. However, this process is subjective to expert interpretations. Furthermore, most existing computer-aided diagnosis methods focus on imaging-based models, overlooking the clinical context and suffering from data scarcity, limiting their ability to learn robust representations. We propose a geometric multimodal Foundation Model (FM), named MFM-Geom, that learns representations from bp-MRI and clinical reports, encoding visual findings and information from the context of clinical variables. In the representations classification head, the approach leverages symmetric positive definite (SPD) matrices and Riemannian deep learning to integrate imaging-text representations from a biomedical multimodal FM. Using 10% of the training data, MFM-Geom outperformed baseline class token embedding-based classification (+8.3%, AUC-PR of 90.67). Generalization on external dataset confirmed the robustness of fine-tuning biomedical FM, achieving an AUC-PR of 90.6.

* Accepted at IEEE International Symposium on Biomedical Imaging (ISBI) 2026

Via

Access Paper or Ask Questions

A Second-Order Attention Mechanism For Prostate Cancer Segmentation and Detection in Bi-Parametric MRI

Nov 07, 2025

Mateo Ortiz, Juan Olmos, Fabio Martínez

Abstract:The detection of clinically significant prostate cancer lesions (csPCa) from biparametric magnetic resonance imaging (bp-MRI) has emerged as a noninvasive imaging technique for improving accurate diagnosis. Nevertheless, the analysis of such images remains highly dependent on the subjective expert interpretation. Deep learning approaches have been proposed for csPCa lesions detection and segmentation, but they remain limited due to their reliance on extensively annotated datasets. Moreover, the high lesion variability across prostate zones poses additional challenges, even for expert radiologists. This work introduces a second-order geometric attention (SOGA) mechanism that guides a dedicated segmentation network, through skip connections, to detect csPCa lesions. The proposed attention is modeled on the Riemannian manifold, learning from symmetric positive definitive (SPD) representations. The proposed mechanism was integrated into standard U-Net and nnU-Net backbones, and was validated on the publicly available PI-CAI dataset, achieving an Average Precision (AP) of 0.37 and an Area Under the ROC Curve (AUC-ROC) of 0.83, outperforming baseline networks and attention-based methods. Furthermore, the approach was evaluated on the Prostate158 dataset as an independent test cohort, achieving an AP of 0.37 and an AUC-ROC of 0.75, confirming robust generalization and suggesting discriminative learned representations.

* Accepted at the 28th Iberoamerican Congress on Pattern Recognition (CIARP 2025). To appear in Lecture Notes in Computer Science (LNCS), Springer

Via

Access Paper or Ask Questions

A multitask transformer to sign language translation using motion gesture primitives

Mar 25, 2025

Fredy Alejandro Mendoza López, Jefferson Rodriguez, Fabio Martínez

Abstract:The absence of effective communication the deaf population represents the main social gap in this community. Furthermore, the sign language, main deaf communication tool, is unlettered, i.e., there is no formal written representation. In consequence, main challenge today is the automatic translation among spatiotemporal sign representation and natural text language. Recent approaches are based on encoder-decoder architectures, where the most relevant strategies integrate attention modules to enhance non-linear correspondences, besides, many of these approximations require complex training and architectural schemes to achieve reasonable predictions, because of the absence of intermediate text projections. However, they are still limited by the redundant background information of the video sequences. This work introduces a multitask transformer architecture that includes a gloss learning representation to achieve a more suitable translation. The proposed approach also includes a dense motion representation that enhances gestures and includes kinematic information, a key component in sign language. From this representation it is possible to avoid background information and exploit the geometry of the signs, in addition, it includes spatiotemporal representations that facilitate the alignment between gestures and glosses as an intermediate textual representation. The proposed approach outperforms the state-of-the-art evaluated on the CoL-SLTD dataset, achieving a BLEU-4 of 72,64% in split 1, and a BLEU-4 of 14,64% in split 2. Additionally, the strategy was validated on the RWTH-PHOENIX-Weather 2014 T dataset, achieving a competitive BLEU-4 of 11,58%.

* 32 pages, 10 tables, 13 figures

Via

Access Paper or Ask Questions

A digital eye-fixation biomarker using a deep anomaly scheme to classify Parkisonian patterns

Feb 25, 2025

Juan Niño, Luis Guayacán, Santiago Gómez, Fabio Martínez

Abstract:Oculomotor alterations constitute a promising biomarker to detect and characterize Parkinson's disease (PD), even in prodromal stages. Currently, only global and simplified eye movement trajectories are employed to approximate the complex and hidden kinematic relationships of the oculomotor function. Recent advances on machine learning and video analysis have encouraged novel characterizations of eye movement patterns to quantify PD. These schemes enable the identification of spatiotemporal segments primarily associated with PD. However, they rely on discriminative models that require large training datasets and depend on balanced class distributions. This work introduces a novel video analysis scheme to quantify Parkinsonian eye fixation patterns with an anomaly detection framework. Contrary to classical deep discriminative schemes that learn differences among labeled classes, the proposed approach is focused on one-class learning, avoiding the necessity of a significant amount of data. The proposed approach focuses only on Parkinson's representation, considering any other class sample as an anomaly of the distribution. This approach was evaluated for an ocular fixation task, in a total of 13 control subjects and 13 patients on different stages of the disease. The proposed digital biomarker achieved an average sensitivity and specificity of 0.97 and 0.63, respectively, yielding an AUC-ROC of 0.95. A statistical test shows significant differences (p < 0.05) among predicted classes, evidencing a discrimination between patients and control subjects.

* 6 pages, 4 images

Via

Access Paper or Ask Questions

APIS: A paired CT-MRI dataset for ischemic stroke segmentation challenge

Sep 26, 2023

Santiago Gómez, Daniel Mantilla, Gustavo Garzón, Edgar Rangel, Andrés Ortiz, Franklin Sierra-Jerez, Fabio Martínez

Abstract:Stroke is the second leading cause of mortality worldwide. Immediate attention and diagnosis play a crucial role regarding patient prognosis. The key to diagnosis consists in localizing and delineating brain lesions. Standard stroke examination protocols include the initial evaluation from a non-contrast CT scan to discriminate between hemorrhage and ischemia. However, non-contrast CTs may lack sensitivity in detecting subtle ischemic changes in the acute phase. As a result, complementary diffusion-weighted MRI studies are captured to provide valuable insights, allowing to recover and quantify stroke lesions. This work introduced APIS, the first paired public dataset with NCCT and ADC studies of acute ischemic stroke patients. APIS was presented as a challenge at the 20th IEEE International Symposium on Biomedical Imaging 2023, where researchers were invited to propose new computational strategies that leverage paired data and deal with lesion segmentation over CT sequences. Despite all the teams employing specialized deep learning tools, the results suggest that the ischemic stroke segmentation task from NCCT remains challenging. The annotated dataset remains accessible to the public upon registration, inviting the scientific community to deal with stroke characterization from NCCT but guided with paired DWI information.

Via

Access Paper or Ask Questions