Abstract:The recognition of medical entities from natural language is an ubiquitous problem in the medical field, with applications ranging from medical act coding to the analysis of electronic health data for public health. It is however a complex task usually requiring human expert intervention, thus making it expansive and time consuming. The recent advances in artificial intelligence, specifically the raise of deep learning methods, has enabled computers to make efficient decisions on a number of complex problems, with the notable example of neural sequence models and their powerful applications in natural language processing. They however require a considerable amount of data to learn from, which is typically their main limiting factor. However, the C\'epiDc stores an exhaustive database of death certificates at the French national scale, amounting to several millions of natural language examples provided with their associated human coded medical entities available to the machine learning practitioner. This article investigates the applications of deep neural sequence models to the medical entity recognition from natural language problem.
Abstract:Underlying cause of death coding from death certificates is a process that is nowadays undertaken mostly by humans with a potential assistance from expert systems such as the Iris software. It is as a consequence an expensive process that can in addition suffer from geospatial discrepancies, thus severely impairing the comparability of death statistics at the international level. The recent advances in artificial intelligence, specifically the raise of deep learning methods, has enabled computers to make efficient decisions on a number of complex problem that were typically considered as out of reach without human assistance. They however require a considerable amount of data to learn from, which is typically their main limiting factor. However, the C\'epiDc stores an exhaustive database of death certificate at the French national scale, amounting to several millions training example available for the machine learning practitioner. This article presents a deep learning based tool for automated coding of the underlying cause of death from the data contained in death certificates with 97.8% accuracy, a substantial achievement compared to the Iris software and its 75% accuracy assessed on the same test examples. Such an improvement opens a whole field of new applications, from nosologist-level batch automated coding to international and temporal harmonization of cause of death statistics.