Abstract:To benefit from the modeling capacity of deep models in system identification, without worrying about inference time, this study presents a novel training strategy that uses deep models only at the training stage. For this purpose two separate models with different structures and goals are employed. The first one is a deep generative model aiming at modeling the distribution of system output(s), called the teacher model, and the second one is a shallow basis function model, named the student model, fed by system input(s) to predict the system output(s). That means these isolated paths must reach the same ultimate target. As deep models show a great performance in modeling of highly nonlinear systems, aligning the representation space learned by these two models make the student model to inherit the approximation power of the teacher model. The proposed objective function consists of the objective of each student and teacher model adding up with a distance penalty between the learned latent representations. The simulation results on three nonlinear benchmarks show a comparative performance with examined deep architectures applied on the same benchmarks. Algorithmic transparency and structure efficiency are also achieved as byproducts.