Abstract:We develop an algorithm to semantically parse linear ordering problems, which require a model to arrange entities using deductive reasoning. Our method takes as input a number of premises and candidate statements, parsing them to a first-order logic of an ordering domain, and then utilizes constraint logic programming to infer the truth of proposed statements about the ordering. Our semantic parser transforms Heim and Kratzer's syntax-based compositional formal semantic rules to a computational algorithm. This transformation involves introducing abstract types and templates based on their rules, and introduces a dynamic component to interpret entities within a contextual framework. Our symbolic system, the Formal Semantic Logic Inferer (FSLI), is applied to answer multiple choice questions in BIG-bench's logical_deduction multiple choice problems, achieving perfect accuracy, compared to 67.06% for the best-performing LLM (GPT-4) and 87.63% for the hybrid system Logic-LM. These promising results demonstrate the benefit of developing a semantic parsing algorithm driven by first-order logic constructs.
Abstract:This paper proposes a deep learning-based method to identify the segments of a clinical note corresponding to ICD-9 broad categories which are further color-coded with respect to 17 ICD-9 categories. The proposed Medical Segment Colorer (MSC) architecture is a pipeline framework that works in three stages: (1) word categorization, (2) phrase allocation, and (3) document classification. MSC uses gated recurrent unit neural networks (GRUs) to map from an input document to word multi-labels to phrase allocations, and uses statistical median to map phrase allocation to document multi-label. We compute variable length segment coloring from overlapping phrase allocation probabilities. These cross-level bidirectional contextual links identify adaptive context and then produce segment coloring. We train and evaluate MSC using the document labeled MIMIC-III clinical notes. Training is conducted solely using document multi-labels without any information on phrases, segments, or words. In addition to coloring a clinical note, MSC generates as byproducts document multi-labeling and word tagging -- creation of ICD9 category keyword lists based on segment coloring. Performance comparison of MSC byproduct document multi-labels versus methods whose purpose is to produce justifiable document multi-labels is 64% vs 52.4% micro-average F1-score against the CAML (CNN attention multi label) method. For evaluation of MSC segment coloring results, medical practitioners independently assigned the colors to broad ICD9 categories given a sample of 40 colored notes and a sample of 50 words related to each category based on the word tags. Binary scoring of this evaluation has a median value of 83.3% and mean of 63.7%.