Abstract:In this work, we present the first dataset, \dataset, for performing event extraction from conversational email threads. To this end, we first proposed a new taxonomy covering 10 event types and 76 arguments in the email domain. Our final dataset includes $\sim$4K emails annotated with $\sim$9K event instances. To understand the task challenges, we conducted a series of experiments comparing two commonly-seen lines of approaches for event extraction, i.e., sequence labeling and generative end-to-end extraction (including few-shot GPT-3.5). Our results showed that the task of email event extraction is far from being addressed, due to challenges lying in, e.g., extracting non-continuous, shared trigger spans, extracting non-named entity arguments, and modeling the email conversational history. Our work thus suggests more investigations in this domain-specific event extraction task in the future.\footnote{The source code and dataset can be obtained from \url{https://github.com/salokr/Email-Event-Extraction}.
Abstract:The humanity has been facing a plethora of challenges associated with infectious diseases, which kill more than 6 million people a year. Although continuous efforts have been applied to relieve the potential damages from such misfortunate events, it is unquestionable that there are many persisting challenges yet to overcome. One related issue we particularly address here is the assessment and prediction of such epidemics. In this field of study, traditional and ad-hoc models frequently fail to provide proper predictive situation awareness (PSAW), characterized by understanding the current situations and predicting the future situations. Comprehensive PSAW for infectious disease can support decision making and help to hinder disease spread. In this paper, we develop a computing system platform focusing on collective intelligence causal modeling, in order to support PSAW in the domain of infectious disease. Analyses of global epidemics require integration of multiple different data and models, which can be originated from multiple independent researchers. These models should be integrated to accurately assess and predict the infectious disease in terms of holistic view. The system shall provide three main functions: (1) collaborative causal modeling, (2) causal model integration, and (3) causal model reasoning. These functions are supported by subject-matter expert and artificial intelligence (AI), with uncertainty treatment. Subject-matter experts, as collective intelligence, develop causal models and integrate them as one joint causal model. The integrated causal model shall be used to reason about: (1) the past, regarding how the causal factors have occurred; (2) the present, regarding how the spread is going now; and (3) the future, regarding how it will proceed. Finally, we introduce one use case of predictive situation awareness for the Ebola virus disease.
Abstract:Hybrid Bayesian Networks (HBNs), which contain both discrete and continuous variables, arise naturally in many application areas (e.g., image understanding, data fusion, medical diagnosis, fraud detection). This paper concerns inference in an important subclass of HBNs, the conditional Gaussian (CG) networks, in which all continuous random variables have Gaussian distributions and all children of continuous random variables must be continuous. Inference in CG networks can be NP-hard even for special-case structures, such as poly-trees, where inference in discrete Bayesian networks can be performed in polynomial time. Therefore, approximate inference is required. In approximate inference, it is often necessary to trade off accuracy against solution time. This paper presents an extension to the Hybrid Message Passing inference algorithm for general CG networks and an algorithm for optimizing its accuracy given a bound on computation time. The extended algorithm uses Gaussian mixture reduction to prevent an exponential increase in the number of Gaussian mixture components. The trade-off algorithm performs pre-processing to find optimal run-time settings for the extended algorithm. Experimental results for four CG networks compare performance of the extended algorithm with existing algorithms and show the optimal settings for these CG networks.