In sequential machine teaching, a teacher's objective is to provide the optimal sequence of inputs to sequential learners in order to guide them towards the best model. In this paper we extend this setting from current static one-data-set analyses to learners which change their learning algorithm or latent state to improve during learning, and to generalize to new datasets. We introduce a multi-agent formulation in which learners' inner state may change with the teaching interaction, which affects the learning performance in future tasks. In order to teach such learners, we propose an optimal control approach that takes the future performance of the learner after teaching into account. This provides tools for modelling learners having inner states, and machine teaching of meta-learning algorithms. Furthermore, we distinguish manipulative teaching, which can be done by effectively hiding data and also used for indoctrination, from more general education which aims to help the learner become better at generalization and learning in new datasets in the absence of a teacher.