We investigate the problem of using a learning-based strategy to stabilize a synthetic toggle switch via an external control approach. To overcome the data efficiency problem that would render the algorithm unfeasible for practical use in synthetic biology, we adopt a sim-to-real paradigm where the policy is learnt via training on a simplified model of the toggle switch and it is then subsequently exploited to control a more realistic model of the switch parameterized from in-vivo experiments. Our in-silico experiments confirm the viability of the approach suggesting its potential use for in-vivo control implementations.