Abstract:Codification of free-text clinical narratives have long been recognised to be beneficial for secondary uses such as funding, insurance claim processing and research. The current scenario of assigning codes is a manual process which is very expensive, time-consuming and error prone. In recent years, many researchers have studied the use of Natural Language Processing (NLP), related Machine Learning (ML) and Deep Learning (DL) methods and techniques to resolve the problem of manual coding of clinical narratives and to assist human coders to assign clinical codes more accurately and efficiently. This systematic literature review provides a comprehensive overview of automated clinical coding systems that utilises appropriate NLP, ML and DL methods and techniques to assign ICD codes to discharge summaries. We have followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA) guidelines and conducted a comprehensive search of publications from January, 2010 to December 2020 in four academic databases- PubMed, ScienceDirect, Association for Computing Machinery(ACM) Digital Library, and the Association for Computational Linguistics(ACL) Anthology. We reviewed 7,556 publications; 38 met the inclusion criteria. This review identified: datasets having discharge summaries; NLP techniques along with some other data extraction processes, different feature extraction and embedding techniques. To measure the performance of classification methods, different evaluation metrics are used. Lastly, future research directions are provided to scholars who are interested in automated ICD code assignment. Efforts are still required to improve ICD code prediction accuracy, availability of large-scale de-identified clinical corpora with the latest version of the classification system. This can be a platform to guide and share knowledge with the less experienced coders and researchers.
Abstract:We present a new method by which the total masses of galaxies including dark matter can be estimated from the kinematics of their globular cluster systems (GCSs). In the proposed method, we apply the convolutional neural networks (CNNs) to the two-dimensional (2D) maps of line-of-sight-velocities ($V$) and velocity dispersions ($\sigma$) of GCSs predicted from numerical simulations of disk and elliptical galaxies. In this method, we first train the CNN using either only a larger number ($\sim 200,000$) of the synthesized 2D maps of $\sigma$ ("one-channel") or those of both $\sigma$ and $V$ ("two-channel"). Then we use the CNN to predict the total masses of galaxies (i.e., test the CNN) for the totally unknown dataset that is not used in training the CNN. The principal results show that overall accuracy for one-channel and two-channel data is 97.6\% and 97.8\% respectively, which suggests that the new method is promising. The mean absolute errors (MAEs) for one-channel and two-channel data are 0.288 and 0.275 respectively, and the value of root mean square errors (RMSEs) are 0.539 and 0.51 for one-channel and two-channel respectively. These smaller MAEs and RMSEs for two-channel data (i.e., better performance) suggest that the new method can properly consider the global rotation of GCSs in the mass estimation. We stress that the prediction accuracy in the new mass estimation method not only depends on the architectures of CNNs but also can be affected by the introduction of noise in the synthesized images.