Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sébastien Giguère

Greedy Biomarker Discovery in the Genome with Applications to Antimicrobial Resistance

May 22, 2015

Alexandre Drouin, Sébastien Giguère, Maxime Déraspe, François Laviolette, Mario Marchand, Jacques Corbeil

Figure 1 for Greedy Biomarker Discovery in the Genome with Applications to Antimicrobial Resistance

Abstract:The Set Covering Machine (SCM) is a greedy learning algorithm that produces sparse classifiers. We extend the SCM for datasets that contain a huge number of features. The whole genetic material of living organisms is an example of such a case, where the number of feature exceeds 10^7. Three human pathogens were used to evaluate the performance of the SCM at predicting antimicrobial resistance. Our results show that the SCM compares favorably in terms of sparsity and accuracy against L1 and L2 regularized Support Vector Machines and CART decision trees. Moreover, the SCM was the only algorithm that could consider the full feature space. For all other algorithms, the latter had to be filtered as a preprocessing step.

* Peer-reviewed and accepted for an oral presentation in the Greed is Great workshop at the International Conference on Machine Learning, Lille, France, 2015

Via

Access Paper or Ask Questions

On the String Kernel Pre-Image Problem with Applications in Drug Discovery

Dec 04, 2014

Sébastien Giguère, Amélie Rolland, François Laviolette, Mario Marchand

Abstract:The pre-image problem has to be solved during inference by most structured output predictors. For string kernels, this problem corresponds to finding the string associated to a given input. An algorithm capable of solving or finding good approximations to this problem would have many applications in computational biology and other fields. This work uses a recent result on combinatorial optimization of linear predictors based on string kernels to develop, for the pre-image, a low complexity upper bound valid for many string kernels. This upper bound is used with success in a branch and bound searching algorithm. Applications and results in the discovery of druggable peptides are presented and discussed.

* Peer-reviewed and accepted for presentation at Machine Learning in Computational Biology 2014, Montr\'eal, Qu\'ebec, Canada

Via

Access Paper or Ask Questions

Learning interpretable models of phenotypes from whole genome sequences with the Set Covering Machine

Dec 02, 2014

Alexandre Drouin, Sébastien Giguère, Vladana Sagatovich, Maxime Déraspe, François Laviolette, Mario Marchand, Jacques Corbeil

Figure 1 for Learning interpretable models of phenotypes from whole genome sequences with the Set Covering Machine

Figure 2 for Learning interpretable models of phenotypes from whole genome sequences with the Set Covering Machine

Figure 3 for Learning interpretable models of phenotypes from whole genome sequences with the Set Covering Machine

Abstract:The increased affordability of whole genome sequencing has motivated its use for phenotypic studies. We address the problem of learning interpretable models for discrete phenotypes from whole genomes. We propose a general approach that relies on the Set Covering Machine and a k-mer representation of the genomes. We show results for the problem of predicting the resistance of Pseudomonas Aeruginosa, an important human pathogen, against 4 antibiotics. Our results demonstrate that extremely sparse models which are biologically relevant can be learnt using this approach.

* Presented at Machine Learning in Computational Biology 2014, Montr\'eal, Qu\'ebec, Canada

Via

Access Paper or Ask Questions

Learning a peptide-protein binding affinity predictor with kernel ridge regression

Jul 31, 2012

Sébastien Giguère, Mario Marchand, François Laviolette, Alexandre Drouin, Jacques Corbeil

Abstract:We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalize eight kernels, such as the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it's approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of accurately predicting the binding affinity of any peptide to any protein. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. On all benchmarks, our method significantly (p-value < 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. The method should be of value to a large segment of the research community with the potential to accelerate peptide-based drug and vaccine development.

* BMC Bioinformatics 2013, 14:82
* 22 pages, 4 figures, 5 tables

Via

Access Paper or Ask Questions