Control variates are a well-established tool to reduce the variance of Monte Carlo estimators. However, for large-scale problems including high-dimensional and large-sample settings, their advantages can be outweighed by a substantial computational cost. This paper considers control variates based on Stein operators, presenting a framework that encompasses and generalizes existing approaches that use polynomials, kernels and neural networks. A learning strategy based on minimising a variational objective through stochastic optimization is proposed, leading to scalable and effective control variates. Our results are both empirical, based on a range of test functions and problems in Bayesian inference, and theoretical, based on an analysis of the variance reduction that can be achieved.