Abstract:This paper proposes a new method to predict individual political ideology from digital footprints on one of the world's largest online discussion forum. We compiled a unique data set from the online discussion forum reddit that contains information on the political ideology of around 91,000 users as well as records of their comment frequency and the comments' text corpus in over 190,000 different subforums of interest. Applying a set of statistical learning approaches, we show that information about activity in non-political discussion forums alone, can very accurately predict a user's political ideology. Depending on the model, we are able to predict the economic dimension of ideology with an accuracy of up to 90.63% and the social dimension with and accuracy of up to 82.02%. In comparison, using the textual features from actual comments does not improve predictive accuracy. Our paper highlights the importance of revealed digital behaviour to complement stated preferences from digital communication when analysing human preferences and behaviour using online data.
Abstract:Alternative data is increasingly adapted to predict human and economic behaviour. This paper introduces a new type of alternative data by re-conceptualising the internet as a data-driven insights platform at global scale. Using data from a unique internet activity and location dataset drawn from over 1.5 trillion observations of end-user internet connections, we construct a functional dataset covering over 1,600 cities during a 7 year period with temporal resolution of just 15min. To predict accurate temporal patterns of sleep and work activity from this data-set, we develop a new technique, Segmented Functional Classification Analysis (SFCA), and compare its performance to a wide array of linear, functional, and classification methods. To confirm the wider applicability of SFCA, in a second application we predict sleep and work activity using SFCA from US city-wide electricity demand functional data. Across both problems, SFCA is shown to out-perform current methods.