By establishing a connection between bi-directional Helmholtz machines and information theory, we propose a generalized Helmholtz machine. Theoretical and experimental results show that given \textit{shallow} architectures, the generalized model outperforms the previous ones substantially.