We study quantum-classical separations between classical and quantum supervised learning models based on constant depth (i.e., shallow) circuits, in scenarios with and without noises. We construct a classification problem defined by a noiseless shallow quantum circuit and rigorously prove that any classical neural network with bounded connectivity requires logarithmic depth to output correctly with a larger-than-exponentially-small probability. This unconditional near-optimal quantum-classical separation originates from the quantum nonlocality property that distinguishes quantum circuits from their classical counterparts. We further derive the noise thresholds for demonstrating such a separation on near-term quantum devices under the depolarization noise model. We prove that this separation will persist if the noise strength is upper bounded by an inverse polynomial with respect to the system size, and vanish if the noise strength is greater than an inverse polylogarithmic function. In addition, for quantum devices with constant noise strength, we prove that no super-polynomial classical-quantum separation exists for any classification task defined by shallow Clifford circuits, independent of the structures of the circuits that specify the learning models.