Abstract:A good medical ontology is expected to cover its domain completely and correctly. On the other hand, large ontologies are hard to build, hard to understand, and hard to maintain. Thus, adding new concepts (often multi-word concepts) to an existing ontology must be done judiciously. Only "good" concepts should be added; however, it is difficult to define what makes a concept good. In this research, we propose a metric to measure the goodness of a concept. We identified factors that appear to influence goodness judgments of medical experts and combined them into a single metric. These factors include concept name length (in words), concept occurrence frequency in the medical literature, and syntactic categories of component words. As an added factor, we used the simplicity of a term after mapping it into a specific foreign language. We performed Bayesian optimization of factor weights to achieve maximum agreement between the metric and three medical experts. The results showed that our metric had a 50.67% overall agreement with the experts, as measured by Krippendorff's alpha.
Abstract:While the world has been combating COVID-19 for over three years, an ongoing "Infodemic" due to the spread of fake news regarding the pandemic has also been a global issue. The existence of the fake news impact different aspect of our daily lives, including politics, public health, economic activities, etc. Readers could mistake fake news for real news, and consequently have less access to authentic information. This phenomenon will likely cause confusion of citizens and conflicts in society. Currently, there are major challenges in fake news research. It is challenging to accurately identify fake news data in social media posts. In-time human identification is infeasible as the amount of the fake news data is overwhelming. Besides, topics discussed in fake news are hard to identify due to their similarity to real news. The goal of this paper is to identify fake news on social media to help stop the spread. We present Deep Learning approaches and an ensemble approach for fake news detection. Our detection models achieved higher accuracy than previous studies. The ensemble approach further improved the detection performance. We discovered feature differences between fake news and real news items. When we added them into the sentence embeddings, we found that they affected the model performance. We applied a hybrid method and built models for recognizing topics from posts. We found half of the identified topics were overlapping in fake news and real news, which could increase confusion in the population.
Abstract:The objectives of this research are 1) to develop an ontology for CDoH by utilizing PubMed articles and ChatGPT; 2) to foster ontology reuse by integrating CDoH with an existing SDoH ontology into a unified structure; 3) to devise an overarching conception for all non-clinical determinants of health and to create an initial ontology, called N-CODH, for them; 4) and to validate the degree of correspondence between concepts provided by ChatGPT with the existing SDoH ontology
Abstract:Clinical factors account only for a small portion, about 10-30%, of the controllable factors that affect an individual's health outcomes. The remaining factors include where a person was born and raised, where he/she pursued their education, what their work and family environment is like, etc. These factors are collectively referred to as Social Determinants of Health (SDoH). The majority of SDoH data is recorded in unstructured clinical notes by physicians and practitioners. Recording SDoH data in a structured manner (in an EHR) could greatly benefit from a dedicated ontology of SDoH terms. Our research focuses on extracting sentences from clinical notes, making use of such an SDoH ontology (called SOHO) to provide appropriate concepts. We utilize recent advancements in Deep Learning to optimize the hyperparameters of a Clinical BioBERT model for SDoH text. A genetic algorithm-based hyperparameter tuning regimen was implemented to identify optimal parameter settings. To implement a complete classifier, we pipelined Clinical BioBERT with two subsequent linear layers and two dropout layers. The output predicts whether a text fragment describes an SDoH issue of the patient. We compared the AdamW, Adafactor, and LAMB optimizers. In our experiments, AdamW outperformed the others in terms of accuracy.
Abstract:Social determinants of health are societal factors, such as where a person was born, grew up, works, lives, etc, along with socioeconomic and community factors that affect individual health. Social Determinants of Health are correlated with many clinical outcomes, hence it is desirable to record SDOH data in Electronic Health Records (EHRs). Besides storing images, text, etc., EHRs rely on coded terms available in standard ontologies and terminologies to record observations and analyses. There is a substantial amount of research on understanding the clinical impact of SDOH, ranging from screening tools to practice based interventions. However, there is no comprehensive collection of terms for recording SDOH observations in EHRs. Our research goal is to develop an ontology that covers the terms describing SDOH. We present a prototype ontology called Social Determinant of Health Ontology (SOHO) that covers relevant concepts and IS--A relationships describing impacts and associations of social determinants. We describe the evaluation techniques that we applied to SOHO, including human experts review and algorithmic evaluation.