A major challenge in the field of education is providing review schedules that present learned items at appropriate intervals to each student so that memory is retained over time. In recent years, attempts have been made to formulate item reviews as sequential decision-making problems to realize adaptive instruction based on the knowledge state of students. It has been reported previously that reinforcement learning can help realize mathematical models of students learning strategies to maintain a high memory rate. However, optimization using reinforcement learning requires a large number of interactions, and thus it cannot be applied directly to actual students. In this study, we propose a framework for optimizing teaching strategies by constructing a virtual model of the student while minimizing the interaction with the actual teaching target. In addition, we conducted an experiment considering actual instructions using the mathematical model and confirmed that the model performance is comparable to that of conventional teaching methods. Our framework can directly substitute mathematical models used in experiments with human students, and our results can serve as a buffer between theoretical instructional optimization and practical applications in e-learning systems.