Abstract:Mobile network operator (MNO) data are a rich data source for official statistics, such as present population, mobility, migration, and tourism. Estimating the geographic location of mobile devices is an essential step for statistical inference. Most studies use the Voronoi tessellation for this, which is based on the assumption that mobile devices are always connected to the nearest radio cell. This paper uses a modular Bayesian approach, allowing for different modules of prior knowledge about where devices are expected to be, and different modules for the likelihood of connection given a geographic location. We discuss and compare the use of several prior modules, including one that is based on land use. We show that the Voronoi tessellation can be used as a likelihood module. Alternatively, we propose a signal strength model using radio cell properties such as antenna height, propagation direction, and power. Using Bayes' rule, we derive a posterior probability distribution that is an estimate for the geographic location, which can be used for further statistical inference. We describe the method and provide illustrations of a fictional example that resembles a real-world situation. The method has been implemented in the R packages mobloc and mobvis, which are briefly described.
Abstract:Many mission-critical applications of machine learning (ML) in the real-world require a quality assurance (QA) process before the decisions or predictions of an ML model can be deployed. Because QA4ML users have to view a non-trivial amount of data and perform many input actions to correct errors made by the ML model, an optimally-designed user interface (UI) can reduce the cost of interactions significantly. A UI's effectiveness can be affected by many factors, such as the number of data objects processed concurrently, the types of commands for correcting errors, and the availability of algorithms for assisting users. We propose using simulation to aid the design and optimization of intelligent user interfaces for QA4ML processes. In particular, we focus on simulating the combined effects of human intelligence in selecting appropriate commands and algorithms, and machine intelligence in providing a collection of general-purpose algorithms for reordering data objects to be quality-assured.