Abstract:We present a scalable machine learning (ML) framework for predicting intensive properties and particularly classifying phases of many-body systems. Scalability and transferability are central to the unprecedented computational efficiency of ML methods. In general, linear-scaling computation can be achieved through the divide and conquer approach, and the locality of physical properties is key to partitioning the system into sub-domains that can be solved separately. Based on the locality assumption, ML model is developed for the prediction of intensive properties of a finite-size block. Predictions of large-scale systems can then be obtained by averaging results of the ML model from randomly sampled blocks of the system. We show that the applicability of this approach depends on whether the block-size of the ML model is greater than the characteristic length scale of the system. In particular, in the case of phase identification across a critical point, the accuracy of the ML prediction is limited by the diverging correlation length. The two-dimensional Ising model is used to demonstrate the proposed framework. We obtain an intriguing scaling relation between the prediction accuracy and the ratio of ML block size over the spin-spin correlation length. Implications for practical applications are also discussed.