Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Clarke

A system for exploring big data: an iterative k-means searchlight for outlier detection on open health data

Apr 05, 2023

A. Ravishankar Rao, Daniel Clarke, Subrata Garai, Soumyabrata Dey

Figure 1 for A system for exploring big data: an iterative k-means searchlight for outlier detection on open health data

Figure 2 for A system for exploring big data: an iterative k-means searchlight for outlier detection on open health data

Figure 3 for A system for exploring big data: an iterative k-means searchlight for outlier detection on open health data

Figure 4 for A system for exploring big data: an iterative k-means searchlight for outlier detection on open health data

Abstract:The interactive exploration of large and evolving datasets is challenging as relationships between underlying variables may not be fully understood. There may be hidden trends and patterns in the data that are worthy of further exploration and analysis. We present a system that methodically explores multiple combinations of variables using a searchlight technique and identifies outliers. An iterative k-means clustering algorithm is applied to features derived through a split-apply-combine paradigm used in the database literature. Outliers are identified as singleton or small clusters. This algorithm is swept across the dataset in a searchlight manner. The dimensions that contain outliers are combined in pairs with other dimensions using a susbset scan technique to gain further insight into the outliers. We illustrate this system by anaylzing open health care data released by New York State. We apply our iterative k-means searchlight followed by subset scanning. Several anomalous trends in the data are identified, including cost overruns at specific hospitals, and increases in diagnoses such as suicides. These constitute novel findings in the literature, and are of potential use to regulatory agencies, policy makers and concerned citizens.

* 2018 International Joint Conference on Neural Networks (IJCNN)

Via

Access Paper or Ask Questions