This paper introduces an iterative algorithm designed to train additive models with favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient descent, applied to the coefficients of a truncated basis expansion of the component functions. We show that the resulting estimator satisfies an oracle inequality that allows for model mispecification. In the well-specified setting, by choosing the learning rate carefully across three distinct stages of training, we prove that its risk is minimax optimal in terms of the dependence on the dimensionality of the data and the size of the training sample.