Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anna Jurek-Loughrey

ClusterGraph: a new tool for visualization and compression of multidimensional data

Nov 08, 2024

Paweł Dłotko, Davide Gurnari, Mathis Hallier, Anna Jurek-Loughrey

Abstract:Understanding the global organization of complicated and high dimensional data is of primary interest for many branches of applied sciences. It is typically achieved by applying dimensionality reduction techniques mapping the considered data into lower dimensional space. This family of methods, while preserving local structures and features, often misses the global structure of the dataset. Clustering techniques are another class of methods operating on the data in the ambient space. They group together points that are similar according to a fixed similarity criteria, however unlike dimensionality reduction techniques, they do not provide information about the global organization of the data. Leveraging ideas from Topological Data Analysis, in this paper we provide an additional layer on the output of any clustering algorithm. Such data structure, ClusterGraph, provides information about the global layout of clusters, obtained from the considered clustering algorithm. Appropriate measures are provided to assess the quality and usefulness of the obtained representation. Subsequently the ClusterGraph, possibly with an appropriate structure--preserving simplification, can be visualized and used in synergy with state of the art exploratory data analysis techniques.

* 19 pages, 8 figures

Via

Access Paper or Ask Questions

Exploring Thematic Coherence in Fake News

Dec 17, 2020

Martins Samuel Dogo, Deepak P, Anna Jurek-Loughrey

Figure 1 for Exploring Thematic Coherence in Fake News

Figure 2 for Exploring Thematic Coherence in Fake News

Figure 3 for Exploring Thematic Coherence in Fake News

Abstract:The spread of fake news remains a serious global issue; understanding and curtailing it is paramount. One way of differentiating between deceptive and truthful stories is by analyzing their coherence. This study explores the use of topic models to analyze the coherence of cross-domain news shared online. Experimental results on seven cross-domain datasets demonstrate that fake news shows a greater thematic deviation between its opening sentences and its remainder.

* 10 pages, 1 figure, to be published in Proceedings of the 8th International Workshop on News Recommendation and Analytics (INRA 2020)

Via

Access Paper or Ask Questions

Hotspot identification for Mapper graphs

Dec 03, 2020

Ciara Frances Loughrey, Nick Orr, Anna Jurek-Loughrey, Paweł Dłotko

Figure 1 for Hotspot identification for Mapper graphs

Figure 2 for Hotspot identification for Mapper graphs

Figure 3 for Hotspot identification for Mapper graphs

Figure 4 for Hotspot identification for Mapper graphs

Abstract:Mapper algorithm can be used to build graph-based representations of high-dimensional data capturing structurally interesting features such as loops, flares or clusters. The graph can be further annotated with additional colouring of vertices allowing location of regions of special interest. For instance, in many applications, such as precision medicine, Mapper graph has been used to identify unknown compactly localized subareas within the dataset demonstrating unique or unusual behaviours. This task, performed so far by a researcher, can be automatized using hotspot analysis. In this work we propose a new algorithm for detecting hotspots in Mapper graphs. It allows automatizing of the hotspot detection process. We demonstrate the performance of the algorithm on a number of artificial and real world datasets. We further demonstrate how our algorithm can be used for the automatic selection of the Mapper lens functions.

* Topological Data Analysis and Beyond Workshop at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

Via

Access Paper or Ask Questions

ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences

Oct 21, 2020

Soumya Suvra Ghosal, Deepak P, Anna Jurek-Loughrey

Figure 1 for ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences

Figure 2 for ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences

Figure 3 for ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences

Figure 4 for ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences

Abstract:Disinformation is often presented in long textual articles, especially when it relates to domains such as health, often seen in relation to COVID-19. These articles are typically observed to have a number of trustworthy sentences among which core disinformation sentences are scattered. In this paper, we propose a novel unsupervised task of identifying sentences containing key disinformation within a document that is known to be untrustworthy. We design a three-phase statistical NLP solution for the task which starts with embedding sentences within a bespoke feature space designed for the task. Sentences represented using those features are then clustered, following which the key sentences are identified through proximity scoring. We also curate a new dataset with sentence level disinformation scorings to aid evaluation for this task; the dataset is being made publicly available to facilitate further research. Based on a comprehensive empirical evaluation against techniques from related tasks such as claim detection and summarization, as well as against simplified variants of our proposed approach, we illustrate that our method is able to identify core disinformation effectively.

* The 22nd International Conference on Information Integration and Web-based Applications & Services (iiWAS '20), Chiang Mai, Thailand

Via

Access Paper or Ask Questions