Abstract:Mobile devices have become essential for capturing human activity, and eXtended Data Records (XDRs) offer rich opportunities for detailed user behavior modeling, which is useful for designing personalized digital services. Previous studies have primarily focused on aggregated mobile traffic and mobility analyses, often neglecting individual-level insights. This paper introduces a novel approach that explores the dependency between traffic and mobility behaviors at the user level. By analyzing 13 individual features that encompass traffic patterns and various mobility aspects, we enhance the understanding of how these behaviors interact. Our advanced user modeling framework integrates traffic and mobility behaviors over time, allowing for fine-grained dependencies while maintaining population heterogeneity through user-specific signatures. Furthermore, we develop a Markov model that infers traffic behavior from mobility and vice versa, prioritizing significant dependencies while addressing privacy concerns. Using a week-long XDR dataset from 1,337,719 users across several provinces in Chile, we validate our approach, demonstrating its robustness and applicability in accurately inferring user behavior and matching mobility and traffic profiles across diverse urban contexts.
Abstract:Predicting the geographical location of users of social media like Twitter has found several applications in health surveillance, emergency monitoring, content personalization, and social studies in general. In this work we contribute to the research in this area by designing and evaluating new methods based on the literature of weighted multigraphs combined with state-of-the-art deep learning techniques. The explored methods depart from a similar underlying structure (that of an extended mention and/or follower network) but use different information processing strategies, e.g., information diffusion through transductive and inductive algorithms -- RGCNs and GraphSAGE, respectively -- and node embeddings with Node2vec+. These graphs are then combined with attention mechanisms to incorporate the users' text view into the models. We assess the performance of each of these methods and compare them to baseline models in the publicly available Twitter-US dataset; we also make a new dataset available based on a large Twitter capture in Latin America. Finally, our work discusses the limitations and validity of the comparisons among methods in the context of different label definitions and metrics.
Abstract:There is an increasing interest in the NLP community in capturing variations in the usage of language, either through time (i.e., semantic drift), across regions (as dialects or variants) or in different social contexts (i.e., professional or media technolects). Several successful dynamical embeddings have been proposed that can track semantic change through time. Here we show that a model with a central word representation and a slice-dependent contribution can learn word embeddings from different corpora simultaneously. This model is based on a star-like representation of the slices. We apply it to The New York Times and The Guardian newspapers, and we show that it can capture both temporal dynamics in the yearly slices of each corpus, and language variations between US and UK English in a curated multi-source corpus. We provide an extensive evaluation of this methodology.