Abstract:Social media has been considered as a data source for tracking disease. However, most analyses are based on models that prioritize strong correlation with population-level disease rates over determining whether or not specific individual users are actually sick. Taking a different approach, we develop a novel system for social-media based disease detection at the individual level using a sample of professionally diagnosed individuals. Specifically, we develop a system for making an accurate influenza diagnosis based on an individual's publicly available Twitter data. We find that about half (17/35 = 48.57%) of the users in our sample that were sick explicitly discuss their disease on Twitter. By developing a meta classifier that combines text analysis, anomaly detection, and social network analysis, we are able to diagnose an individual with greater than 99% accuracy even if she does not discuss her health.
Abstract:We present a descriptive analysis of Twitter data. Our study focuses on extracting the main side effects associated with HIV treatments. The crux of our work was the identification of personal tweets referring to HIV. We summarize our results in an infographic aimed at the general public. In addition, we present a measure of user sentiment based on hand-rated tweets.