Abstract:We present new results to model and understand the role of encoder-decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss (MIL), to represent predictive structures in machine learning. Our first main result provides a functional expression that characterizes the class of probabilistic models consistent with an IS encoder-decoder latent predictive structure. This result formally justifies the encoder-decoder forward stages many modern ML architectures adopt to learn latent (compressed) representations for classification. To illustrate IS as a realistic and relevant model assumption, we revisit some known ML concepts and present some interesting new examples: invariant, robust, sparse, and digital models. Furthermore, our IS characterization allows us to tackle the fundamental question of how much performance (predictive expressiveness) could be lost, using the cross entropy risk, when a given encoder-decoder architecture is adopted in a learning setting. Here, our second main result shows that a mutual information loss quantifies the lack of expressiveness attributed to the choice of a (biased) encoder-decoder ML design. Finally, we address the problem of universal cross-entropy learning with an encoder-decoder design where necessary and sufficiency conditions are established to meet this requirement. In all these results, Shannon's information measures offer new interpretations and explanations for representation learning.