CBIO
Abstract:Finding inclusion-minimal "hitting sets" for a given collection of sets is a fundamental combinatorial problem with applications in domains as diverse as Boolean algebra, computational biology, and data mining. Much of the algorithmic literature focuses on the problem of *recognizing* the collection of minimal hitting sets; however, in many of the applications, it is more important to *generate* these hitting sets. We survey twenty algorithms from across a variety of domains, considering their history, classification, useful features, and computational performance on a variety of synthetic and real-world inputs. We also provide a suite of implementations of these algorithms with a ready-to-use, platform-agnostic interface based on Docker containers and the AlgoRun framework, so that interested computational scientists can easily perform similar tests with inputs from their own research areas on their own computers or through a convenient Web interface.
Abstract:Inferring the structure of gene regulatory networks (GRN) from gene expression data has many applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy. In this paper, we formulate GRN inference as a sparse regression problem and investigate the performance of a popular feature selection method, least angle regression (LARS) combined with stability selection. We introduce a novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS. The resulting method, which we call TIGRESS (Trustful Inference of Gene REgulation using Stability Selection), was ranked among the top methods in the DREAM5 gene network reconstruction challenge. We investigate in depth the influence of the various parameters of the method and show that a fine parameter tuning can lead to significant improvements and state-of-the-art performance for GRN inference. TIGRESS reaches state-of-the-art performance on benchmark data. This study confirms the potential of feature selection techniques for GRN inference. Code and data are available on http://cbio.ensmp.fr/~ahaury. Running TIGRESS online is possible on GenePattern: http://www.broadinstitute.org/cancer/software/genepattern/.