Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fabian Scheidt

Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop

Sep 27, 2024

Fabian Scheidt, Jasin Machkour, Michael Muma

Figure 1 for Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop

Figure 2 for Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop

Figure 3 for Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop

Figure 4 for Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop

Abstract:Currently, there is an urgent demand for scalable multivariate and high-dimensional false discovery rate (FDR)-controlling variable selection methods to ensure the repro-ducibility of discoveries. However, among existing methods, only the recently proposed Terminating-Random Experiments (T-Rex) selector scales to problems with millions of variables, as encountered in, e.g., genomics research. The T-Rex selector is a new learning framework based on early terminated random experiments with computer-generated dummy variables. In this work, we propose the Big T-Rex, a new implementation of T-Rex that drastically reduces its Random Access Memory (RAM) consumption to enable solving FDR-controlled sparse regression problems with millions of variables on a laptop. We incorporate advanced memory-mapping techniques to work with matrices that reside on solid-state drive and two new dummy generation strategies based on permutations of a reference matrix. Our nu-merical experiments demonstrate a drastic reduction in memory demand and computation time. We showcase that the Big T-Rex can efficiently solve FDR-controlled Lasso-type problems with five million variables on a laptop in thirty minutes. Our work empowers researchers without access to high-performance clusters to make reproducible discoveries in large-scale high-dimensional data.

* 2023 IEEE 9th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)
* Conference article (IEEE CAMSAP 2023), 5 pages, 7 figures

Via

Access Paper or Ask Questions