Abstract:Empirical natural language processing (NLP) systems in application domains (e.g., healthcare, finance, education) involve interoperation among multiple components, ranging from data ingestion, human annotation, to text retrieval, analysis, generation, and visualization. We establish a unified open-source framework to support fast development of such sophisticated NLP workflows in a composable manner. The framework introduces a uniform data representation to encode heterogeneous results by a wide range of NLP tasks. It offers a large repository of processors for NLP tasks, visualization, and annotation, which can be easily assembled with full interoperability under the unified representation. The highly extensible framework allows plugging in custom processors from external off-the-shelf NLP and deep learning libraries. The whole framework is delivered through two modularized yet integratable open-source projects, namely Forte1 (for workflow infrastructure and NLP function processors) and Stave2 (for user interaction, visualization, and annotation).
Abstract:We search for digital biomarkers from Parkinson's Disease by observing approximate repetitive patterns matching hypothesized step and stride periodic cycles. These observations were modeled as a cycle of hidden states with randomness allowing deviation from a canonical pattern of transitions and emissions, under the hypothesis that the averaged features of hidden states would serve to informatively characterize classes of patients/controls. We propose a Hidden Semi-Markov Model (HSMM), a latent-state model, emitting 3D-acceleration vectors. Transitions and emissions are inferred from data. We fit separate models per unique device and training label. Hidden Markov Models (HMM) force geometric distributions of the duration spent at each state before transition to a new state. Instead, our HSMM allows us to specify the distribution of state duration. This modified version is more effective because we are interested more in each state's duration than the sequence of distinct states, allowing inclusion of these durations the feature vector.