Abstract:Cataract surgery is the most common surgical procedure globally, with a disproportionately higher burden in developing countries. While automated surgical video analysis has been explored in general surgery, its application to ophthalmic procedures remains limited. Existing works primarily focus on Phaco cataract surgery, an expensive technique not accessible in regions where cataract treatment is most needed. In contrast, Manual Small-Incision Cataract Surgery (MSICS) is the preferred low-cost, faster alternative in high-volume settings and for challenging cases. However, no dataset exists for MSICS. To address this gap, we introduce Cataract-MSICS, the first comprehensive dataset containing 53 surgical videos annotated for 18 surgical phases and 3,527 frames with 13 surgical tools at the pixel level. We benchmark this dataset on state-of-the-art models and present ToolSeg, a novel framework that enhances tool segmentation by introducing a phase-conditional decoder and a simple yet effective semi-supervised setup leveraging pseudo-labels from foundation models. Our approach significantly improves segmentation performance, achieving a $23.77\%$ to $38.10\%$ increase in mean Dice scores, with a notable boost for tools that are less prevalent and small. Furthermore, we demonstrate that ToolSeg generalizes to other surgical settings, showcasing its effectiveness on the CaDIS dataset.
Abstract:One of the major challenges of the twenty-first century is climate change, evidenced by rising sea levels, melting glaciers, and increased storm frequency. Accurate temperature forecasting is vital for understanding and mitigating these impacts. Traditional data-driven models often use recurrent neural networks (RNNs) but face limitations in parallelization, especially with longer sequences. To address this, we introduce a novel approach based on the FocalNet Transformer architecture. Our Focal modulation Attention Encoder (FATE) framework operates in a multi-tensor format, utilizing tensorized modulation to capture spatial and temporal nuances in meteorological data. Comparative evaluations against existing transformer encoders, 3D CNNs, LSTM, and ConvLSTM models show that FATE excels at identifying complex patterns in temperature data. Additionally, we present a new labeled dataset, the Climate Change Parameter dataset (CCPD), containing 40 years of data from Jammu and Kashmir on seven climate-related parameters. Experiments with real-world temperature datasets from the USA, Canada, and Europe show accuracy improvements of 12\%, 23\%, and 28\%, respectively, over current state-of-the-art models. Our CCPD dataset also achieved a 24\% improvement in accuracy. To support reproducible research, we have released the source code and pre-trained FATE model at \href{https://github.com/Tajamul21/FATE}{https://github.com/Tajamul21/FATE}.
Abstract:In clinical applications, X-ray technology is vital for noninvasive examinations like mammography, providing essential anatomical information. However, the radiation risk associated with X-ray procedures raises concerns. X-ray reconstruction is crucial in medical imaging for detailed visual representations of internal structures, aiding diagnosis and treatment without invasive procedures. Recent advancements in deep learning (DL) have shown promise in X-ray reconstruction, but conventional DL methods often require centralized aggregation of large datasets, leading to domain shifts and privacy issues. To address these challenges, we introduce the Hierarchical Framework-based Federated Learning method (HF-Fed) for customized X-ray imaging. HF-Fed tackles X-ray imaging optimization by decomposing the problem into local data adaptation and holistic X-ray imaging. It employs a hospital-specific hierarchical framework and a shared common imaging network called Network of Networks (NoN) to acquire stable features from diverse data distributions. The hierarchical hypernetwork extracts domain-specific hyperparameters, conditioning the NoN for customized X-ray reconstruction. Experimental results demonstrate HF-Fed's competitive performance, offering a promising solution for enhancing X-ray imaging without data sharing. This study significantly contributes to the literature on federated learning in healthcare, providing valuable insights for policymakers and healthcare providers. The source code and pre-trained HF-Fed model are available at \url{https://tisharepo.github.io/Webpage/}.
Abstract:We focus on the problem of Unsupervised Domain Adaptation (\uda) for breast cancer detection from mammograms (BCDM) problem. Recent advancements have shown that masked image modeling serves as a robust pretext task for UDA. However, when applied to cross-domain BCDM, these techniques struggle with breast abnormalities such as masses, asymmetries, and micro-calcifications, in part due to the typically much smaller size of region of interest in comparison to natural images. This often results in more false positives per image (FPI) and significant noise in pseudo-labels typically used to bootstrap such techniques. Recognizing these challenges, we introduce a transformer-based Domain-invariant Mask Annealed Student Teacher autoencoder (D-MASTER) framework. D-MASTER adaptively masks and reconstructs multi-scale feature maps, enhancing the model's ability to capture reliable target domain features. D-MASTER also includes adaptive confidence refinement to filter pseudo-labels, ensuring only high-quality detections are considered. We also provide a bounding box annotated subset of 1000 mammograms from the RSNA Breast Screening Dataset (referred to as RSNA-BSD1K) to support further research in BCDM. We evaluate D-MASTER on multiple BCDM datasets acquired from diverse domains. Experimental results show a significant improvement of 9% and 13% in sensitivity at 0.3 FPI over state-of-the-art UDA techniques on publicly available benchmark INBreast and DDSM datasets respectively. We also report an improvement of 11% and 17% on In-house and RSNA-BSD1K datasets respectively. The source code, pre-trained D-MASTER model, along with RSNA-BSD1K dataset annotations is available at https://dmaster-iitd.github.io/webpage.