Abstract:Many useful tasks on scientific documents, such as topic classification and citation prediction, involve corpora that span multiple scientific domains. Typically, such tasks are accomplished by representing the text with a vector embedding obtained from a Transformer's single CLS token. In this paper, we argue that using multiple CLS tokens could make a Transformer better specialize to multiple scientific domains. We present Multi2SPE: it encourages each of multiple CLS tokens to learn diverse ways of aggregating token embeddings, then sums them up together to create a single vector representation. We also propose our new multi-domain benchmark, Multi-SciDocs, to test scientific paper vector encoders under multi-domain settings. We show that Multi2SPE reduces error by up to 25 percent in multi-domain citation prediction, while requiring only a negligible amount of computation in addition to one BERT forward pass.
Abstract:For many business applications, we often seek to analyze sentiments associated with any arbitrary aspects of commercial products, despite having a very limited amount of labels or even without any labels at all. However, existing aspect target sentiment classification (ATSC) models are not trainable if annotated datasets are not available. Even with labeled data, they fall short of reaching satisfactory performance. To address this, we propose simple approaches that better solve ATSC with natural language prompts, enabling the task under zero-shot cases and enhancing supervised settings, especially for few-shot cases. Under the few-shot setting for SemEval 2014 Task 4 laptop domain, our method of reformulating ATSC as an NLI task outperforms supervised SOTA approaches by up to 24.13 accuracy points and 33.14 macro F1 points. Moreover, we demonstrate that our prompts could handle implicitly stated aspects as well: our models reach about 77% accuracy on detecting sentiments for aspect categories (e.g., food), which do not necessarily appear within the text, even though we trained the models only with explicitly mentioned aspect terms (e.g., fajitas) from just 16 reviews - while the accuracy of the no-prompt baseline is only around 65%.
Abstract:This dissertation investigates integer linear programming (ILP) formulation of Bayesian Network structure learning problem. We review the definition and key properties of Bayesian network and explain score metrics used to measure how well certain Bayesian network structure fits the dataset. We outline the integer linear programming formulation based on the decomposability of score metrics. In order to ensure acyclicity of the structure, we add ``cluster constraints'' developed specifically for Bayesian network, in addition to cycle constraints applicable to directed acyclic graphs in general. Since there would be exponential number of these constraints if we specify them fully, we explain the methods to add them as cutting planes without declaring them all in the initial model. Also, we develop a heuristic algorithm that finds a feasible solution based on the idea of sink node on directed acyclic graphs. We implemented the ILP formulation and cutting planes as a \textsf{Python} package, and present the results of experiments with different settings on reference datasets.
Abstract:In this report, we present qualitative analysis of Monte Carlo (MC) dropout method for measuring model uncertainty in neural network (NN) models. We first consider the sources of uncertainty in NNs, and briefly review Bayesian Neural Networks (BNN), the group of Bayesian approaches to tackle uncertainties in NNs. After presenting mathematical formulation of MC dropout, we proceed to suggesting potential benefits and associated costs for using MC dropout in typical NN models, with the results from our experiments.