Objective: A variety of pattern analysis techniques for model training in brain interfaces exploit neural feature dimensionality reduction based on feature ranking and selection heuristics. In the light of broad evidence demonstrating the potential sub-optimality of ranking based feature selection by any criterion, we propose to extend this focus with an information theoretic learning driven feature transformation concept. Methods: We present a maximum mutual information linear transformation (MMI-LinT), and a nonlinear transformation (MMI-NonLinT) framework derived by a general definition of the feature transformation learning problem. Empirical assessments are performed based on electroencephalographic (EEG) data recorded during a four class motor imagery brain-computer interface (BCI) task. Exploiting state-of-the-art methods for initial feature vector construction, we compare the proposed approaches with conventional feature selection based dimensionality reduction techniques which are widely used in brain interfaces. Furthermore, for the multi-class problem, we present and exploit a hierarchical graphical model based BCI decoding system. Results: Both binary and multi-class decoding analyses demonstrate significantly better performances with the proposed methods. Conclusion: Information theoretic feature transformations are capable of tackling potential confounders of conventional approaches in various settings. Significance: We argue that this concept provides significant insights to extend the focus on feature selection heuristics to a broader definition of feature transformation learning in brain interfaces.