Abstract:Multilevel optimization has gained renewed interest in machine learning due to its promise in applications such as hyperparameter tuning and continual learning. However, existing methods struggle with the inherent difficulty of efficiently handling the nested structure. This paper introduces a novel gradient-based approach for multilevel optimization that overcomes these limitations by leveraging a hierarchically structured decomposition of the full gradient and employing advanced propagation techniques. Extending to n-level scenarios, our method significantly reduces computational complexity while improving both solution accuracy and convergence speed. We demonstrate the effectiveness of our approach through numerical experiments, comparing it with existing methods across several benchmarks. The results show a notable improvement in solution accuracy. To the best of our knowledge, this is one of the first algorithms to provide a general version of implicit differentiation with both theoretical guarantees and superior empirical performance.