Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Robert M. Rolfe

Topic Similarity Networks: Visual Analytics for Large Document Sets

Sep 26, 2014

Arun S. Maiya, Robert M. Rolfe

Figure 1 for Topic Similarity Networks: Visual Analytics for Large Document Sets

Figure 2 for Topic Similarity Networks: Visual Analytics for Large Document Sets

Figure 3 for Topic Similarity Networks: Visual Analytics for Large Document Sets

Figure 4 for Topic Similarity Networks: Visual Analytics for Large Document Sets

Abstract:We investigate ways in which to improve the interpretability of LDA topic models by better analyzing and visualizing their outputs. We focus on examining what we refer to as topic similarity networks: graphs in which nodes represent latent topics in text collections and links represent similarity among topics. We describe efficient and effective approaches to both building and labeling such networks. Visualizations of topic models based on these networks are shown to be a powerful means of exploring, characterizing, and summarizing large collections of unstructured text documents. They help to "tease out" non-obvious connections among different sets of documents and provide insights into how topics form larger themes. We demonstrate the efficacy and practicality of these approaches through two case studies: 1) NSF grants for basic research spanning a 14 year period and 2) the entire English portion of Wikipedia.

* 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData 2014)

Via

Access Paper or Ask Questions

Exploratory Analysis of Highly Heterogeneous Document Collections

Aug 11, 2013

Arun S. Maiya, John P. Thompson, Francisco Loaiza-Lemos, Robert M. Rolfe

Figure 1 for Exploratory Analysis of Highly Heterogeneous Document Collections

Figure 2 for Exploratory Analysis of Highly Heterogeneous Document Collections

Figure 3 for Exploratory Analysis of Highly Heterogeneous Document Collections

Figure 4 for Exploratory Analysis of Highly Heterogeneous Document Collections

Abstract:We present an effective multifaceted system for exploratory analysis of highly heterogeneous document collections. Our system is based on intelligently tagging individual documents in a purely automated fashion and exploiting these tags in a powerful faceted browsing framework. Tagging strategies employed include both unsupervised and supervised approaches based on machine learning and natural language processing. As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). KERA extracts topic-representative terms from individual documents in a purely unsupervised fashion and is revealed to be significantly more effective than state-of-the-art methods. Finally, we evaluate our system in its ability to help users locate documents pertaining to military critical technologies buried deep in a large heterogeneous sea of information.

* 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Via

Access Paper or Ask Questions