DP-BandMF offers a powerful approach to differentially private machine learning, balancing privacy amplification with noise correlation for optimal noise reduction. However, its scalability has been limited to settings where the number of training iterations is less than $10^4$. In this work, we present techniques that significantly extend DP-BandMF's reach, enabling use in settings with and over $10^6$ training iterations. Our enhanced implementation, coupled with extensive experiments, provides clear guidelines on selecting the optimal number of bands. These insights offer practitioners a deeper understanding of DP-BandMF's performance and how to maximize its utility for privacy-preserving machine learning.