We propose a novel objective function for learning robust deep representations of data based on information theory. Data is projected into a feature-vector space such that the mutual information of all subsets of features relative to the supervising signal is maximized. This objective function gives rise to robust representations by conserving available information relative to supervision in the face of noisy or unavailable features. Although the objective function is not directly tractable, we are able to derive a surrogate objective function. Minimizing this surrogate loss encourages features to be non-redundant and conditionally independent relative to the supervising signal. To evaluate the quality of obtained solutions, we have performed a set of preliminary experiments that show promising results.