Abstract:The discipline of automatic control is making increased use of concepts that originate from the domain of machine learning. Herein, reinforcement learning (RL) takes an elevated role, as it is inherently designed for sequential decision making, and can be applied to optimal control problems without the need for a plant system model. To advance education of control engineers and operators in this field, this contribution targets an RL framework that can be applied to educational hardware provided by the Lucas-N\"ulle company. Specifically, the goal of inverted pendulum control is pursued by means of RL, including both, swing-up and stabilization within a single holistic design approach. Herein, the actual learning is enabled by separating corresponding computations from the real-time control computer and outsourcing them to a different hardware. This distributed architecture, however, necessitates communication of the involved components, which is realized via CAN bus. The experimental proof of concept is presented with an applied safeguarding algorithm that prevents the plant from being operated harmfully during the trial-and-error training phase.