Time series data constitutes a distinct and growing problem in machine learning. As the corpus of time series data grows larger, deep models that simultaneously learn features and classify with these features can be intractable or suboptimal. In this paper, we present feature learning via long short term memory (LSTM) networks and prediction via gradient boosting trees (XGB). Focusing on the consequential setting of electronic health record data, we predict the occurrence of hypoxemia five minutes into the future based on past features. We make two observations: 1) long short term memory networks are effective at capturing long term dependencies based on a single feature and 2) gradient boosting trees are capable of tractably combining a large number of features including static features like height and weight. With these observations in mind, we generate features by performing "supervised" representation learning with LSTM networks. Augmenting the original XGB model with these features gives significantly better performance than either individual method.