Abstract:Electronic healthcare records (EHR) contain a huge wealth of data that can support the prediction of clinical outcomes. EHR data is often stored and analysed using clinical codes (ICD10, SNOMED), however these can differ across registries and healthcare providers. Integrating data across systems involves mapping between different clinical ontologies requiring domain expertise, and at times resulting in data loss. To overcome this, code-agnostic models have been proposed. We assess the effectiveness of a code-agnostic representation approach on the task of long-term microvascular complication prediction for individuals living with Type 2 Diabetes. Our method encodes individual EHRs as text using fine-tuned, pretrained clinical language models. Leveraging large-scale EHR data from the UK, we employ a multi-label approach to simultaneously predict the risk of microvascular complications across 1-, 5-, and 10-year windows. We demonstrate that a code-agnostic approach outperforms a code-based model and illustrate that performance is better with longer prediction windows but is biased to the first occurring complication. Overall, we highlight that context length is vitally important for model performance. This study highlights the possibility of including data from across different clinical ontologies and is a starting point for generalisable clinical models.
Abstract:Developing artificial intelligence (AI) tools for healthcare is a multiple disciplinary effort, bringing data scientists, clinicians, patients and other disciplines together. In this paper, we explore the AI development workflow and how participants navigate the challenges and tensions of sharing and generating knowledge across disciplines. Through an inductive thematic analysis of 13 semi-structured interviews with participants in a large research consortia, our findings suggest that multiple disciplinarity heavily impacts work practices. Participants faced challenges to learn the languages of other disciplines and needed to adapt the tools used for sharing and communicating with their audience, particularly those from a clinical or patient perspective. Large health datasets also posed certain restrictions on work practices. We identified meetings as a key platform for facilitating exchanges between disciplines and allowing for the blending and creation of knowledge. Finally, we discuss design implications for data science and collaborative tools, and recommendations for future research.