We propose a new framework for single-channel source separation that lies between the fully supervised and unsupervised setting. Instead of supervision, we provide input features for each source signal and use convex methods to estimate the correlations between these features and the unobserved signal decomposition. We analyze the case of $\ell_2$ loss theoretically and show that recovery of the signal components depends only on cross-correlation between features for different signals, not on correlations between features for the same signal. Contextually supervised source separation is a natural fit for domains with large amounts of data but no explicit supervision; our motivating application is energy disaggregation of hourly smart meter data (the separation of whole-home power signals into different energy uses). Here we apply contextual supervision to disaggregate the energy usage of thousands homes over four years, a significantly larger scale than previously published efforts, and demonstrate on synthetic data that our method outperforms the unsupervised approach.