Abstract:The Tweedie exponential dispersion family is a popular choice among many to model insurance losses that consist of zero-inflated semicontinuous data. In such data, it is often important to obtain credibility (inference) of the most important features that describe the endogenous variables. Post-selection inference is the standard procedure in statistics to obtain confidence intervals of model parameters after performing a feature extraction procedure. For a linear model, the lasso estimate often has non-negligible estimation bias for large coefficients corresponding to exogenous variables. To have valid inference on those coefficients, it is necessary to correct the bias of the lasso estimate. Traditional statistical methods, such as hypothesis testing or standard confidence interval construction might lead to incorrect conclusions during post-selection, as they are generally too optimistic. Here we discuss a few methodologies for constructing confidence intervals of the coefficients after feature selection in the Generalized Linear Model (GLM) family with application to insurance data.
Abstract:Opioids are an effective analgesic for acute and chronic pain, but also carry a considerable risk of addiction leading to millions of opioid use disorder (OUD) cases and tens of thousands of premature deaths in the United States yearly. Estimating OUD risk prior to prescription could improve the efficacy of treatment regimens, monitoring programs, and intervention strategies, but risk estimation is typically based on self-reported data or questionnaires. We develop an experimental design and computational methods that combines genetic variants associated with OUD with behavioral features extracted from GPS and Wi-Fi spatiotemporal coordinates to assess OUD risk. Since both OUD mobility and genetic data do not exist for the same cohort, we develop algorithms to (1) generate mobility features from empirical distributions and (2) synthesize mobility and genetic samples assuming a level of comorbidity and relative risks. We show that integrating genetic and mobility modalities improves risk modelling using classification accuracy, area under the precision-recall and receiver operator characteristic curves, and $F_1$ score. Interpreting the fitted models suggests that mobility features have more influence on OUD risk, although the genetic contribution was significant, particularly in linear models. While there exists concerns with respect to privacy, security, bias, and generalizability that must be evaluated in clinical trials before being implemented in practice, our framework provides preliminary evidence that behavioral and genetic features may improve OUD risk estimation to assist with personalized clinical decision-making.