Abstract:We present an approach to compute the monetary value of individual data points, in context of an automated decision system. The proposed method enables us to explore and implement a paradigm of data minimalism for large-scale machine learning systems. Data minimalistic implementations enhance scalability, while maintaining or even optimizing a system's performance. Using two types of recommender systems, we first demonstrate how much data is ineffective in both settings. We then present a general account of computing data value via sensitivity analysis, and how, in theory, individual data points can be priced according to their informational contribution to automated decisions. We further exemplify this method to lab-scale recommender systems and outline further steps towards commercial data-minimalistic applications.
Abstract:Predictive geometric models deliver excellent results for many Machine Learning use cases. Despite their undoubted performance, neural predictive algorithms can show unexpected degrees of instability and variance, particularly when applied to large datasets. We present an approach to measure changes in geometric models with respect to both output consistency and topological stability. Considering the example of a recommender system using word2vec, we analyze the influence of single data points, approximation methods and parameter settings. Our findings can help to stabilize models where needed and to detect differences in informational value of data points on a large scale.