Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Christina Heinze

DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

Jan 08, 2016

Christina Heinze, Brian McWilliams, Nicolai Meinshausen

Figure 1 for DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

Abstract:We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependences between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy.

* Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 51, 2016, 12 pages
* 13 pages

Via

Access Paper or Ask Questions

backShift: Learning causal cyclic graphs from unknown shift interventions

Nov 18, 2015

Dominik Rothenhäusler, Christina Heinze, Jonas Peters, Nicolai Meinshausen

Figure 1 for backShift: Learning causal cyclic graphs from unknown shift interventions

Figure 2 for backShift: Learning causal cyclic graphs from unknown shift interventions

Figure 3 for backShift: Learning causal cyclic graphs from unknown shift interventions

Abstract:We propose a simple method to learn linear causal cyclic models in the presence of latent variables. The method relies on equilibrium data of the model recorded under a specific kind of interventions ("shift interventions"). The location and strength of these interventions do not have to be known and can be estimated from the data. Our method, called backShift, only uses second moments of the data and performs simple joint matrix diagonalization, applied to differences between covariance matrices. We give a sufficient and necessary condition for identifiability of the system, which is fulfilled almost surely under some quite general assumptions if and only if there are at least three distinct experimental settings, one of which can be pure observational data. We demonstrate the performance on some simulated data and applications in flow cytometry and financial time series. The code is made available as R-package backShift.

* Advances in Neural Information Processing Systems 28 (2015) 1513-1521

Via

Access Paper or Ask Questions

LOCO: Distributing Ridge Regression with Random Projections

Jun 08, 2015

Christina Heinze, Brian McWilliams, Nicolai Meinshausen, Gabriel Krummenacher

Figure 1 for LOCO: Distributing Ridge Regression with Random Projections

Figure 2 for LOCO: Distributing Ridge Regression with Random Projections

Figure 3 for LOCO: Distributing Ridge Regression with Random Projections

Abstract:We propose LOCO, an algorithm for large-scale ridge regression which distributes the features across workers on a cluster. Important dependencies between variables are preserved using structured random projections which are cheap to compute and must only be communicated once. We show that LOCO obtains a solution which is close to the exact ridge regression solution in the fixed design setting. We verify this experimentally in a simulation study as well as an application to climate prediction. Furthermore, we show that LOCO achieves significant speedups compared with a state-of-the-art distributed algorithm on a large-scale regression problem.

* 37 pages

Via

Access Paper or Ask Questions