Abstract:Current literature in criminal justice analytics often focuses on predicting the likelihood of recidivism (repeat offenses committed by released defendants), but this problem is fraught with ethical missteps ranging from selection bias in data collection to model interpretability. This paper re-purposes Machine Learning (ML) in criminal justice to identify social determinants of recidivism, with contributions along three dimensions. (1) We shift the focus from predicting which individuals will re-offend to identifying the broader underlying factors that explain differences in recidivism, with the goal of providing a reliable framework for preventative policy intervention. (2) Recidivism models typically agglomerate all individuals into one dataset to carry out ML tasks. We instead apply unsupervised learning to reduce noise and extract homogeneous subgroups of individuals, with a novel heuristic to find the optimal number of subgroups. (3) We subsequently apply supervised learning within the subgroups to determine statistically significant features that are correlated to recidivism. It is our view that this new approach to a long-standing question will serve as a useful guide for the practical application of ML in policymaking.
Abstract:We present algorithms for the detection of a class of heart arrhythmias with the goal of eventual adoption by practicing cardiologists. In clinical practice, detection is based on a small number of meaningful features extracted from the heartbeat cycle. However, techniques proposed in the literature use high dimensional vectors consisting of morphological, and time based features for detection. Using electrocardiogram (ECG) signals, we found smaller subsets of features sufficient to detect arrhythmias with high accuracy. The features were found by an iterative step-wise feature selection method. We depart from common literature in the following aspects: 1. As opposed to a high dimensional feature vectors, we use a small set of features with meaningful clinical interpretation, 2. we eliminate the necessity of short-duration patient-specific ECG data to append to the global training data for classification 3. We apply semi-parametric classification procedures (in an ensemble framework) for arrhythmia detection, and 4. our approach is based on a reduced sampling rate of ~ 115 Hz as opposed to 360 Hz in standard literature.