Optimal Power Flow (OPF) is a valuable tool for power system operators, but it is a difficult problem to solve for large systems. Machine Learning (ML) algorithms, especially Neural Networks-based (NN) optimization proxies, have emerged as a promising new tool for solving OPF, by estimating the OPF solution much faster than traditional methods. However, these ML algorithms act as black boxes, and it is hard to assess their worst-case performance across the entire range of possible inputs than an OPF can have. Previous work has proposed a mixed-integer programming-based methodology to quantify the worst-case violations caused by a NN trained to estimate the OPF solution, throughout the entire input domain. This approach, however, does not scale well to large power systems and more complex NN models. This paper addresses these issues by proposing a scalable algorithm to compute worst-case violations of NN proxies used for approximating large power systems within a reasonable time limit. This will help build trust in ML models to be deployed in large industry-scale power grids.