Abstract:In this paper, we introduce SailCompass, a reproducible and robust evaluation benchmark for assessing Large Language Models (LLMs) on Southeast Asian Languages (SEA). SailCompass encompasses three main SEA languages, eight primary tasks including 14 datasets covering three task types (generation, multiple-choice questions, and classification). To improve the robustness of the evaluation approach, we explore different prompt configurations for multiple-choice questions and leverage calibrations to improve the faithfulness of classification tasks. With SailCompass, we derive the following findings: (1) SEA-specialized LLMs still outperform general LLMs, although the gap has narrowed; (2) A balanced language distribution is important for developing better SEA-specialized LLMs; (3) Advanced prompting techniques (e.g., calibration, perplexity-based ranking) are necessary to better utilize LLMs. All datasets and evaluation scripts are public.
Abstract:The healthcare sector has experienced a rapid accumulation of digital data recently, especially in the form of electronic health records (EHRs). EHRs constitute a precious resource that IS researchers could utilize for clinical applications (e.g., morbidity prediction). Deep learning seems like the obvious choice to exploit this surfeit of data. However, numerous studies have shown that deep learning does not enjoy the same kind of success on EHR data as it has in other domains; simple models like logistic regression are frequently as good as sophisticated deep learning ones. Inspired by this observation, we develop a novel model called rational logistic regression (RLR) that has standard logistic regression (LR) as its special case (and thus inherits LR's inductive bias that aligns with EHR data). RLR has rational series as its theoretical underpinnings, works on longitudinal time-series data, and learns interpretable patterns. Empirical comparisons on real-world clinical tasks demonstrate RLR's efficacy.
Abstract:Every publicly traded company in the US is required to file an annual 10-K financial report, which contains a wealth of information about the company. In this paper, we propose an explainable deep-learning model, called FinBERT-XRC, that takes a 10-K report as input, and automatically assesses the post-event return volatility risk of its associated company. In contrast to previous systems, our proposed model simultaneously offers explanations of its classification decision at three different levels: the word, sentence, and corpus levels. By doing so, our model provides a comprehensive interpretation of its prediction to end users. This is particularly important in financial domains, where the transparency and accountability of algorithmic predictions play a vital role in their application to decision-making processes. Aside from its novel interpretability, our model surpasses the state of the art in predictive accuracy in experiments on a large real-world dataset of 10-K reports spanning six years.
Abstract:Argument mining involves multiple sub-tasks that automatically identify argumentative elements, such as claim detection, evidence extraction, stance classification, etc. However, each subtask alone is insufficient for a thorough understanding of the argumentative structure and reasoning process. To learn a complete view of an argument essay and capture the interdependence among argumentative components, we need to know what opinions people hold (i.e., claims), why those opinions are valid (i.e., supporting evidence), which source the evidence comes from (i.e., evidence type), and how those claims react to the debating topic (i.e., stance). In this work, we for the first time propose a challenging argument quadruplet extraction task (AQE), which can provide an all-in-one extraction of four argumentative components, i.e., claims, evidence, evidence types, and stances. To support this task, we construct a large-scale and challenging dataset. However, there is no existing method that can solve the argument quadruplet extraction. To fill this gap, we propose a novel quad-tagging augmented generative approach, which leverages a quadruplet tagging module to augment the training of the generative framework. The experimental results on our dataset demonstrate the empirical superiority of our proposed approach over several strong baselines.
Abstract:Document-level relation extraction (DocRE) predicts relations for entity pairs that rely on long-range context-dependent reasoning in a document. As a typical multi-label classification problem, DocRE faces the challenge of effectively distinguishing a small set of positive relations from the majority of negative ones. This challenge becomes even more difficult to overcome when there exists a significant number of annotation errors in the dataset. In this work, we aim to achieve better integration of both the discriminability and robustness for the DocRE problem. Specifically, we first design an effective loss function to endow high discriminability to both probabilistic outputs and internal representations. We innovatively customize entropy minimization and supervised contrastive learning for the challenging multi-label and long-tailed learning problems. To ameliorate the impact of label errors, we equipped our method with a novel negative label sampling strategy to strengthen the model robustness. In addition, we introduce two new data regimes to mimic more realistic scenarios with annotation errors and evaluate our sampling strategy. Experimental results verify the effectiveness of each component and show that our method achieves new state-of-the-art results on the DocRED dataset, its recently cleaned version, Re-DocRED, and the proposed data regimes.
Abstract:Knowledge graph embeddings (KGEs) compactly encode multi-relational knowledge graphs (KGs). Existing KGE models rely on geometric operations to model relational patterns. Euclidean (circular) rotation is useful for modeling patterns such as symmetry, but cannot represent hierarchical semantics. In contrast, hyperbolic models are effective at modeling hierarchical relations, but do not perform as well on patterns on which circular rotation excels. It is crucial for KGE models to unify multiple geometric transformations so as to fully cover the multifarious relations in KGs. To do so, we propose BiQUE, a novel model that employs biquaternions to integrate multiple geometric transformations, viz., scaling, translation, Euclidean rotation, and hyperbolic rotation. BiQUE makes the best trade-offs among geometric operators during training, picking the best one (or their best combination) for each relation. Experiments on five datasets show BiQUE's effectiveness.
Abstract:E-commerce platforms categorize their products into a multi-level taxonomy tree with thousands of leaf categories. Conventional methods for product categorization are typically based on machine learning classification algorithms. These algorithms take product information as input (e.g., titles and descriptions) to classify a product into a leaf category. In this paper, we propose a new paradigm based on machine translation. In our approach, we translate a product's natural language description into a sequence of tokens representing a root-to-leaf path in a product taxonomy. In our experiments on two large real-world datasets, we show that our approach achieves better predictive accuracy than a state-of-the-art classification system for product categorization. In addition, we demonstrate that our machine translation models can propose meaningful new paths between previously unconnected nodes in a taxonomy tree, thereby transforming the taxonomy into a directed acyclic graph (DAG). We discuss how the resultant taxonomy DAG promotes user-friendly navigation, and how it is more adaptable to new products.