Abstract:The integration of Artificial Intelligence (AI) in healthcare presents a transformative potential for enhancing operational efficiency and health outcomes. Large Language Models (LLMs), such as ChatGPT, have shown their capabilities in supporting medical decision-making. Embedding LLMs in medical systems is becoming a promising trend in healthcare development. The potential of ChatGPT to address the triage problem in emergency departments has been examined, while few studies have explored its application in outpatient departments. With a focus on streamlining workflows and enhancing efficiency for outpatient triage, this study specifically aims to evaluate the consistency of responses provided by ChatGPT in outpatient guidance, including both within-version response analysis and between-version comparisons. For within-version, the results indicate that the internal response consistency for ChatGPT-4.0 is significantly higher than ChatGPT-3.5 (p=0.03) and both have a moderate consistency (71.2% for 4.0 and 59.6% for 3.5) in their top recommendation. However, the between-version consistency is relatively low (mean consistency score=1.43/3, median=1), indicating few recommendations match between the two versions. Also, only 50% top recommendations match perfectly in the comparisons. Interestingly, ChatGPT-3.5 responses are more likely to be complete than those from ChatGPT-4.0 (p=0.02), suggesting possible differences in information processing and response generation between the two versions. The findings offer insights into AI-assisted outpatient operations, while also facilitating the exploration of potentials and limitations of LLMs in healthcare utilization. Future research may focus on carefully optimizing LLMs and AI integration in healthcare systems based on ergonomic and human factors principles, precisely aligning with the specific needs of effective outpatient triage.
Abstract:With the urgent demand for generalized deep models, many pre-trained big models are proposed, such as BERT, ViT, GPT, etc. Inspired by the success of these models in single domains (like computer vision and natural language processing), the multi-modal pre-trained big models have also drawn more and more attention in recent years. In this work, we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cutting-edge works. Specifically, we firstly introduce the background of multi-modal pre-training by reviewing the conventional deep learning, pre-training works in natural language process, computer vision, and speech. Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network architectures, and knowledge enhanced pre-training. After that, we introduce the downstream tasks used for the validation of large-scale MM-PTMs, including generative, classification, and regression tasks. We also give visualization and analysis of the model parameters and results on representative downstream tasks. Finally, we point out possible research directions for this topic that may benefit future works. In addition, we maintain a continuously updated paper list for large-scale pre-trained multi-modal big models: https://github.com/wangxiao5791509/MultiModal_BigModels_Survey
Abstract:Class Activation Mapping (CAM) has been widely adopted to generate saliency maps which provides visual explanations for deep neural networks (DNNs). The saliency maps are conventionally generated by fusing the channels of the target feature map using a weighted average scheme. It is a weak model for the inter-channel relation, in the sense that it only models the relation among channels in a contrastive way (i.e., channels that play key roles in the prediction are given higher weights for them to stand out in the fusion). The collaborative relation, which makes the channels work together to provide cross reference, has been ignored. Furthermore, the model has neglected the intra-channel relation thoroughly.In this paper, we address this problem by introducing Conceptor learning into CAM generation. Conceptor leaning has been originally proposed to model the patterns of state changes in recurrent neural networks (RNNs). By relaxing the dependency of Conceptor learning to RNNs, we make Conceptor-CAM not only generalizable to more DNN architectures but also able to learn both the inter- and intra-channel relations for better saliency map generation. Moreover, we have enabled the use of Boolean operations to combine the positive and pseudo-negative evidences, which has made the CAM inference more robust and comprehensive. The effectiveness of Conceptor-CAM has been validated with both formal verifications and experiments on the dataset of the largest scale in literature. The experimental results show that Conceptor-CAM is compatible with and can bring significant improvement to all well recognized CAM-based methods, and has outperformed the state-of-the-art methods by 43.14%~72.79% (88.39%~168.15%) on ILSVRC2012 in Average Increase (Drop), 15.42%~42.55% (47.09%~372.09%) on VOC, and 17.43%~31.32% (47.54%~206.45%) on COCO, respectively.