Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mengtian Guo

How Does Imperfect Automatic Indexing Affect Semantic Search Performance?

Apr 08, 2023

Mengtian Guo, David Gotz, Yue Wang

Abstract:Documents in the health domain are often annotated with semantic concepts (i.e., terms) from controlled vocabularies. As the volume of these documents gets large, the annotation work is increasingly done by algorithms. Compared to humans, automatic indexing algorithms are imperfect and may assign wrong terms to documents, which affect subsequent search tasks where queries contain these terms. In this work, we aim to understand the performance impact of using imperfectly assigned terms in Boolean semantic searches. We used MeSH terms and biomedical literature search as a case study. We implemented multiple automatic indexing algorithms on real-world Boolean queries that consist of MeSH terms, and found that (1) probabilistic logic can handle inaccurately assigned terms better than traditional Boolean logic, (2) query-level performance is mostly limited by lowest-performing terms in a query, and (3) mixing a small amount of human indexing with automatic indexing can regain excellent query-level performance. These findings provide important implications for future work on automatic indexing.

* 9 pages, 4 figures, HealthNLP 2023

Via

Access Paper or Ask Questions

GRAFS: Graphical Faceted Search System to Support Conceptual Understanding in Exploratory Search

Feb 19, 2023

Mengtian Guo, Zhilan Zhou, David Gotz, Yue Wang

Abstract:When people search for information about a new topic within large document collections, they implicitly construct a mental model of the unfamiliar information space to represent what they currently know and guide their exploration into the unknown. Building this mental model can be challenging as it requires not only finding relevant documents, but also synthesizing important concepts and the relationships that connect those concepts both within and across documents. This paper describes a novel interactive approach designed to help users construct a mental model of an unfamiliar information space during exploratory search. We propose a new semantic search system to organize and visualize important concepts and their relations for a set of search results. A user study ($n=20$) was conducted to compare the proposed approach against a baseline faceted search system on exploratory literature search tasks. Experimental results show that the proposed approach is more effective in helping users recognize relationships between key concepts, leading to a more sophisticated understanding of the search topic while maintaining similar functionality and usability as a faceted search system.

* 37 pages, 12 figures, to be published in ACM Transactions on Interactive Intelligent Systems

Via

Access Paper or Ask Questions

An Empirical Study on Explainable Prediction of Text Complexity: Preliminaries for Text Simplification

Jul 31, 2020

Cristina Garbacea, Mengtian Guo, Samuel Carton, Qiaozhu Mei

Figure 1 for An Empirical Study on Explainable Prediction of Text Complexity: Preliminaries for Text Simplification

Figure 2 for An Empirical Study on Explainable Prediction of Text Complexity: Preliminaries for Text Simplification

Figure 3 for An Empirical Study on Explainable Prediction of Text Complexity: Preliminaries for Text Simplification

Figure 4 for An Empirical Study on Explainable Prediction of Text Complexity: Preliminaries for Text Simplification

Abstract:Text simplification is concerned with reducing the language complexity and improving the readability of professional content so that the text is accessible to readers at different ages and educational levels. As a promising practice to improve the fairness and transparency of text information systems, the notion of text simplification has been mixed in existing literature, ranging all the way through assessing the complexity of single words to automatically generating simplified documents. We show that the general problem of text simplification can be formally decomposed into a compact pipeline of tasks to ensure the transparency and explanability of the process. In this paper, we present a systematic analysis of the first two steps in this pipeline: 1) predicting the complexity of a given piece of text, and 2) identifying complex components from the text considered to be complex. We show that these two tasks can be solved separately, using either lexical approaches or the state-of-the-art deep learning methods, or they can be solved jointly through an end-to-end, explainable machine learning predictor. We propose formal evaluation metrics for both tasks, through which we are able to compare the performance of the candidate approaches using multiple datasets from a diversity of domains.

* 10 pages

Via

Access Paper or Ask Questions