Abstract:The use of the iris as a biometric identifier has increased dramatically over the last 30 years, prompting privacy and security concerns about the use of iris images in research. It can be difficult to acquire iris image databases due to ethical concerns, and this can be a barrier for those performing biometrics research. In this paper, we describe and show how to create a database of realistic, biometrically unidentifiable colored iris images by training a diffusion model within an open-source diffusion framework. Not only were we able to verify that our model is capable of creating iris textures that are biometrically unique from the training data, but we were also able to verify that our model output creates a full distribution of realistic iris pigmentations. We highlight the fact that the utility of diffusion networks to achieve these criteria with relative ease, warrants additional research in its use within the context of iris database generation and presentation attack security.
Abstract:Bias detection and mitigation is an active area of research in machine learning. This work extends previous research done by the authors to provide a rigorous and more complete analysis of the bias found in AI predictive models. Admissions data spanning six years was used to create an AI model to determine whether a given student would be directly admitted into the School of Science under various scenarios at a large urban research university. During this time, submission of standardized test scores as part of an application became optional which led to interesting questions about the impact of standardized test scores on admission decisions. We developed and analyzed AI models to understand which variables are important in admissions decisions, and how the decision to exclude test scores affects the demographics of the students who are admitted. We then evaluated the predictive models to detect and analyze biases these models may carry with respect to three variables chosen to represent sensitive populations: gender, race, and whether a student was the first in his or her family to attend college. We also extended our analysis to show that the biases detected were persistent. Finally, we included several fairness metrics in our analysis and discussed the uses and limitations of these metrics.
Abstract:Recent data mining research has focused on the analysis of social media text, content and networks to identify suicide ideation online. However, there has been limited research on the temporal dynamics of users and suicide ideation. In this work, we use time-to-event modeling to identify which subreddits have a higher association with users transitioning to posting on r/suicidewatch. For this purpose we use a Cox proportional hazards model that takes as input text and subreddit network features and outputs a probability distribution for the time until a Reddit user posts on r/suicidewatch. In our analysis we find a number of statistically significant features that predict earlier transitions to r/suicidewatch. While some patterns match existing intuition, for example r/depression is positively associated with posting sooner on r/suicidewatch, others were more surprising (for example, the average time between a high risk post on r/Wishlist and a post on r/suicidewatch is 10.2 days). We then discuss these results as well as directions for future research.