Abstract:Motivation: Recent advances in technology for brain imaging and high-throughput genotyping have motivated studies examining the influence of genetic variation on brain structure. Wang et al. (Bioinformatics, 2012) have developed an approach for the analysis of imaging genomic studies using penalized multi-task regression with regularization based on a novel group $l_{2,1}$-norm penalty which encourages structured sparsity at both the gene level and SNP level. While incorporating a number of useful features, the proposed method only furnishes a point estimate of the regression coefficients; techniques for conducting statistical inference are not provided. A new Bayesian method is proposed here to overcome this limitation. Results: We develop a Bayesian hierarchical modeling formulation where the posterior mode corresponds to the estimator proposed by Wang et al. (Bioinformatics, 2012), and an approach that allows for full posterior inference including the construction of interval estimates for the regression parameters. We show that the proposed hierarchical model can be expressed as a three-level Gaussian scale mixture and this representation facilitates the use of a Gibbs sampling algorithm for posterior simulation. Simulation studies demonstrate that the interval estimates obtained using our approach achieve adequate coverage probabilities that outperform those obtained from the nonparametric bootstrap. Our proposed methodology is applied to the analysis of neuroimaging and genetic data collected as part of the Alzheimer's Disease Neuroimaging Initiative (ADNI), and this analysis of the ADNI cohort demonstrates clearly the value added of incorporating interval estimation beyond only point estimation when relating SNPs to brain imaging endophenotypes.
Abstract:We investigate the choice of tuning parameters for a Bayesian multi-level group lasso model developed for the joint analysis of neuroimaging and genetic data. The regression model we consider relates multivariate phenotypes consisting of brain summary measures (volumetric and cortical thickness values) to single nucleotide polymorphism (SNPs) data and imposes penalization at two nested levels, the first corresponding to genes and the second corresponding to SNPs. Associated with each level in the penalty is a tuning parameter which corresponds to a hyperparameter in the hierarchical Bayesian formulation. Following previous work on Bayesian lassos we consider the estimation of tuning parameters through either hierarchical Bayes based on hyperpriors and Gibbs sampling or through empirical Bayes based on maximizing the marginal likelihood using a Monte Carlo EM algorithm. For the specific model under consideration we find that these approaches can lead to severe overshrinkage of the regression parameter estimates in the high-dimensional setting or when the genetic effects are weak. We demonstrate these problems through simulation examples and study an approximation to the marginal likelihood which sheds light on the cause of this problem. We then suggest an alternative approach based on the widely applicable information criterion (WAIC), an asymptotic approximation to leave-one-out cross-validation that can be computed conveniently within an MCMC framework.