Abstract:Matrix completion has gained considerable interest in recent years. The goal of matrix completion is to predict the unknown entries of a partially observed matrix using its known entries. Although common applications feature discrete rating-scale data, such as user-product rating matrices in recommender systems or surveys in the social and behavioral sciences, methods for matrix completion are almost always designed for and studied in the context of continuous data. Furthermore, only a small subset of the literature considers matrix completion in the presence of corrupted observations despite their common occurrence in practice. Examples include attacks on recommender systems (i.e., malicious users deliberately manipulating ratings to influence the recommender system to their advantage), or careless respondents in surveys (i.e., respondents providing answers irrespective of what the survey asks of them due to a lack of attention). We introduce a matrix completion algorithm that is tailored towards the discrete nature of rating-scale data and robust to the presence of corrupted observations. In addition, we investigate the performance of the proposed method and its competitors with discrete rating-scale (rather than continuous) data as well as under various missing data mechanisms and types of corrupted observations.
Abstract:Although robust statistical estimators are less affected by outlying observations, their computation is usually more challenging. This is particularly the case in high-dimensional sparse settings. The availability of new optimization procedures, mainly developed in the computer science domain, offers new possibilities for the field of robust statistics. This paper investigates how such procedures can be used for robust sparse association estimators. The problem can be split into a robust estimation step followed by an optimization for the remaining decoupled, (bi-)convex problem. A combination of the augmented Lagrangian algorithm and adaptive gradient descent is implemented to also include suitable constraints for inducing sparsity. We provide results concerning the precision of the algorithm and show the advantages over existing algorithms in this context. High-dimensional empirical examples underline the usefulness of this procedure. Extensions to other robust sparse estimators are possible.
Abstract:Questionnaires in the behavioral and organizational sciences tend to be lengthy: survey measures comprising hundreds of items are the norm rather than the exception. However, recent literature suggests that the longer a questionnaire takes, the higher the probability that participants lose interest and start responding carelessly. Consequently, in long surveys a large number of participants may engage in careless responding, posing a major threat to internal validity. We propose a novel method to identify the onset of careless responding (or an absence thereof) for each participant. Specifically, our method is based on combined measurements of up to three dimensions in which carelessness may manifest (inconsistency, invariability, fast responding). Since a structural break in either dimension is potentially indicative of carelessness, our method searches for evidence for changepoints along the three dimensions. Our method is highly flexible, based on machine learning, and provides statistical guarantees on its performance. In simulation experiments, we find that it achieves high reliability in correctly identifying carelessness onset, discriminates well between careless and attentive respondents, and can capture a wide variety of careless response styles, even in datasets with an overwhelming presence of carelessness. In addition, we empirically validate our method on a Big 5 measurement. Furthermore, we provide freely available software in R to enhance accessibility and adoption by empirical researchers.