Sequential-in-time methods solve a sequence of training problems to fit nonlinear parametrizations such as neural networks to approximate solution trajectories of partial differential equations over time. This work shows that sequential-in-time training methods can be understood broadly as either optimize-then-discretize (OtD) or discretize-then-optimize (DtO) schemes, which are well known concepts in numerical analysis. The unifying perspective leads to novel stability and a posteriori error analysis results that provide insights into theoretical and numerical aspects that are inherent to either OtD or DtO schemes such as the tangent space collapse phenomenon, which is a form of over-fitting. Additionally, the unified perspective facilitates establishing connections between variants of sequential-in-time training methods, which is demonstrated by identifying natural gradient descent methods on energy functionals as OtD schemes applied to the corresponding gradient flows.