Abstract:AI-based analysis of histopathology whole slide images (WSIs) is central in computational pathology. However, image quality can impact model performance. Here, we investigate to what extent unsharp areas of WSIs impact deep convolutional neural network classification performance. We propose a multi-model approach, i.e. DeepBlurMM, to alleviate the impact of unsharp image areas and improve the model performance. DeepBlurMM uses the sigma cut-offs to determine the most suitable model for predicting tiles with various levels of blurring within a single WSI, where sigma is the standard deviation of the Gaussian distribution. Specifically, the cut-offs categorise the tiles into sharp or slight blur, moderate blur, and high blur. Each blur level has a corresponding model to be selected for tile-level predictions. Throughout the simulation study, we demonstrated the application of DeepBlurMM in a binary classification task for breast cancer Nottingham Histological Grade 1 vs 3. Performance, evaluated over 5-fold cross-validation, showed that DeepBlurMM outperformed the base model under moderate blur and mixed blur conditions. Unsharp image tiles (local blurriness) at prediction time reduced model performance. The proposed multi-model approach improved performance under some conditions, with the potential to improve quality in both research and clinical applications.
Abstract:This study is a pioneering endeavor to investigate the capabilities of Large Language Models (LLMs) in addressing conceptual questions within the domain of mechanical engineering with a focus on mechanics. Our examination involves a manually crafted exam encompassing 126 multiple-choice questions, spanning various aspects of mechanics courses, including Fluid Mechanics, Mechanical Vibration, Engineering Statics and Dynamics, Mechanics of Materials, Theory of Elasticity, and Continuum Mechanics. Three LLMs, including ChatGPT (GPT-3.5), ChatGPT (GPT-4), and Claude (Claude-2.1), were subjected to evaluation against engineering faculties and students with or without mechanical engineering background. The findings reveal GPT-4's superior performance over the other two LLMs and human cohorts in answering questions across various mechanics topics, except for Continuum Mechanics. This signals the potential future improvements for GPT models in handling symbolic calculations and tensor analyses. The performances of LLMs were all significantly improved with explanations prompted prior to direct responses, underscoring the crucial role of prompt engineering. Interestingly, GPT-3.5 demonstrates improved performance with prompts covering a broader domain, while GPT-4 excels with prompts focusing on specific subjects. Finally, GPT-4 exhibits notable advancements in mitigating input bias, as evidenced by guessing preferences for humans. This study unveils the substantial potential of LLMs as highly knowledgeable assistants in both mechanical pedagogy and scientific research.