Abstract:The evolution of Vision Transformers has led to their widespread adaptation to different domains. Despite large-scale success, there remain significant challenges including their reliance on extensive computational and memory resources for pre-training on huge datasets as well as difficulties in task-specific transfer learning. These limitations coupled with energy inefficiencies mainly arise due to the computation-intensive self-attention mechanism. To address these issues, we propose a novel Super-Pixel Based Patch Pooling (SPPP) technique that generates context-aware, semantically rich, patch embeddings to effectively reduce the architectural complexity and improve efficiency. Additionally, we introduce the Light Latent Attention (LLA) module in our pipeline by integrating latent tokens into the attention mechanism allowing cross-attention operations to significantly reduce the time and space complexity of the attention module. By leveraging the data-intuitive patch embeddings coupled with dynamic positional encodings, our approach adaptively modulates the cross-attention process to focus on informative regions while maintaining the global semantic structure. This targeted attention improves training efficiency and accelerates convergence. Notably, the SPPP module is lightweight and can be easily integrated into existing transformer architectures. Extensive experiments demonstrate that our proposed architecture provides significant improvements in terms of computational efficiency while achieving comparable results with the state-of-the-art approaches, highlighting its potential for energy-efficient transformers suitable for edge deployment. (The code is available on our GitHub repository: https://github.com/zser092/Focused-Attention-ViT).
Abstract:The objective of this paper is to enhance the optimization process for neural networks by developing a dynamic learning rate algorithm that effectively integrates exponential decay and advanced anti-overfitting strategies. Our primary contribution is the establishment of a theoretical framework where we demonstrate that the optimization landscape, under the influence of our algorithm, exhibits unique stability characteristics defined by Lyapunov stability principles. Specifically, we prove that the superlevel sets of the loss function, as influenced by our adaptive learning rate, are always connected, ensuring consistent training dynamics. Furthermore, we establish the "equiconnectedness" property of these superlevel sets, which maintains uniform stability across varying training conditions and epochs. This paper contributes to the theoretical understanding of dynamic learning rate mechanisms in neural networks and also pave the way for the development of more efficient and reliable neural optimization techniques. This study intends to formalize and validate the equiconnectedness of loss function as superlevel sets in the context of neural network training, opening newer avenues for future research in adaptive machine learning algorithms. We leverage previous theoretical discoveries to propose training mechanisms that can effectively handle complex and high-dimensional data landscapes, particularly in applications requiring high precision and reliability.
Abstract:Background: The reproducibility of machine-learning models in prostate cancer detection across different MRI vendors remains a significant challenge. Methods: This study investigates Support Vector Machines (SVM) and Random Forest (RF) models trained on radiomic features extracted from T2-weighted MRI images using Pyradiomics and MRCradiomics libraries. Feature selection was performed using the maximum relevance minimum redundancy (MRMR) technique. We aimed to enhance clinical decision support through multimodal learning and feature fusion. Results: Our SVM model, utilizing combined features from Pyradiomics and MRCradiomics, achieved an AUC of 0.74 on the Multi-Improd dataset (Siemens scanner) but decreased to 0.60 on the Philips test set. The RF model showed similar trends, with notable robustness for models using Pyradiomics features alone (AUC of 0.78 on Philips). Conclusions: These findings demonstrate the potential of multimodal feature integration to improve the robustness and generalizability of machine-learning models for clinical decision support in prostate cancer detection. This study marks a significant step towards developing reliable AI-driven diagnostic tools that maintain efficacy across various imaging platforms.