Abstract:Intracranial hemorrhage occurs when blood vessels rupture or leak within the brain tissue or elsewhere inside the skull. It can be caused by physical trauma or by various medical conditions and in many cases leads to death. The treatment must be started as soon as possible, and therefore the hemorrhage should be diagnosed accurately and quickly. The diagnosis is usually performed by a radiologist who analyses a Computed Tomography (CT) scan containing a large number of cross-sectional images throughout the brain. Analysing each image manually can be very time-consuming, but automated techniques can help speed up the process. While much of the recent research has focused on solving this problem by using supervised machine learning algorithms, publicly-available training data remains scarce due to privacy concerns. This problem can be alleviated by unsupervised algorithms. In this paper, we propose a fully-unsupervised algorithm which is based on the mixture models. Our algorithm utilizes the fact that the properties of hemorrhage and healthy tissues follow different distributions, and therefore an appropriate formulation of these distributions allows us to separate them through an Expectation-Maximization process. In addition, our algorithm is able to adaptively determine the number of clusters such that all the hemorrhage regions can be found without including noisy voxels. We demonstrate the results of our algorithm on publicly-available datasets that contain all different hemorrhage types in various sizes and intensities, and our results are compared to earlier unsupervised and supervised algorithms. The results show that our algorithm can outperform the other algorithms with most hemorrhage types.
Abstract:Automated computer vision systems have been applied in many domains including security, law enforcement, and personal devices, but recent reports suggest that these systems may produce biased results, discriminating against people in certain demographic groups. Diagnosing and understanding the underlying true causes of model biases, however, are challenging tasks because modern computer vision systems rely on complex black-box models whose behaviors are hard to decode. We propose to use an encoder-decoder network developed for image attribute manipulation to synthesize facial images varying in the dimensions of gender and race while keeping other signals intact. We use these synthesized images to measure counterfactual fairness of commercial computer vision classifiers by examining the degree to which these classifiers are affected by gender and racial cues controlled in the images, e.g., feminine faces may elicit higher scores for the concept of nurse and lower scores for STEM-related concepts. We also report the skewed gender representations in an online search service on profession-related keywords, which may explain the origin of the biases encoded in the models.
Abstract:In many real-world machine learning problems, feature values are not readily available. To make predictions, some of the missing features have to be acquired, which can incur a cost in money, computational time, or human time, depending on the problem domain. This leads us to the problem of choosing which features to use at the prediction time. The chosen features should increase the prediction accuracy for a low cost, but determining which features will do that is challenging. The choice should take into account the previously acquired feature values as well as the feature costs. This paper proposes a novel approach to address this problem. The proposed approach chooses the most useful features adaptively based on how relevant they are for the prediction task as well as what the corresponding feature costs are. Our approach uses a generic neural network architecture, which is suitable for a wide range of problems. We evaluate our approach on three cost-sensitive datasets, including Yahoo! Learning to Rank Competition dataset as well as two health datasets. We show that our approach achieves high accuracy with a lower cost than the current state-of-the-art approaches.
Abstract:Existing public face datasets are strongly biased toward Caucasian faces, and other races (e.g., Latino) are significantly underrepresented. This can lead to inconsistent model accuracy, limit the applicability of face analytic systems to non-White race groups, and adversely affect research findings based on such skewed data. To mitigate the race bias in these datasets, we construct a novel face image dataset, containing 108,501 images, with an emphasis of balanced race composition in the dataset. We define 7 race groups: White, Black, Indian, East Asian, Southeast Asian, Middle East, and Latino. Images were collected from the YFCC-100M Flickr dataset and labeled with race, gender, and age groups. Evaluations were performed on existing face attribute datasets as well as novel image datasets to measure generalization performance. We find that the model trained from our dataset is substantially more accurate on novel datasets and the accuracy is consistent between race and gender groups.