Abstract:Probabilistic generative models of graphs are important tools that enable representation and sampling. Many recent works have created probabilistic models of graphs that are capable of representing not only entity interactions but also their attributes. However, given a generative model of random attributed graph(s), the general conditions that establish goodness of fit are not clear a-priori. In this paper, we define goodness of fit in terms of the mean square contingency coefficient for random binary networks. For this statistic, we outline a procedure for assessing the quality of the structure of a learned attributed graph by ensuring that the discrepancy of the mean square contingency coefficient (constant, or random) is minimal with high probability. We apply these criteria to verify the representation capability of a probabilistic generative model for various popular types of graph models.
Abstract:Assessment of job performance, personalized health and psychometric measures are domains where data-driven and ubiquitous computing exhibits the potential of a profound impact in the future. Existing techniques use data extracted from questionnaires, sensors (wearable, computer, etc.), or other traits, to assess well-being and cognitive attributes of individuals. However, these techniques can neither predict individual's well-being and psychological traits in a global manner nor consider the challenges associated to processing the data available, that is incomplete and noisy. In this paper, we create a benchmark for predictive analysis of individuals from a perspective that integrates: physical and physiological behavior, psychological states and traits, and job performance. We design data mining techniques as benchmark and uses real noisy and incomplete data derived from wearable sensors to predict 19 constructs based on 12 standardized well-validated tests. The study included 757 participants who were knowledge workers in organizations across the USA with varied work roles. We developed a data mining framework to extract the meaningful predictors for each of the 19 variables under consideration. Our model is the first benchmark that combines these various instrument-derived variables in a single framework to understand people's behavior by leveraging real uncurated data from wearable, mobile, and social media sources. We verify our approach experimentally using the data obtained from our longitudinal study. The results show that our framework is consistently reliable and capable of predicting the variables under study better than the baselines when prediction is restricted to the noisy, incomplete data.
Abstract:Bayesian networks (BNs) are used for inference and sampling by exploiting conditional independence among random variables. Context specific independence (CSI) is a property of graphical models where additional independence relations arise in the context of particular values of random variables (RVs). Identifying and exploiting CSI properties can simplify inference. Some generative network models (models that generate social/information network samples from a network distribution P(G)), with complex interactions among a set of RVs, can be represented with probabilistic graphical models, in particular with BNs. In the present work we show one such a case. We discuss how a mixed Kronecker Product Graph Model can be represented as a BN, and study its BN properties that can be used for efficient sampling. Specifically, we show that instead of exhibiting CSI properties, the model has deterministic context-specific dependence (DCSD). Exploiting this property focuses the sampling method on a subset of the sampling space that improves efficiency.