The macroscopic traffic flow model is widely used for traffic control and management. To incorporate drivers' anticipative behaviors and to remove impractical speed discontinuity inherent in the classic Lighthill-Whitham-Richards (LWR) traffic model, nonlocal partial differential equation (PDE) models with ``look-ahead" dynamics have been proposed, which assume that the speed is a function of weighted downstream traffic density. However, it lacks data validation on two important questions: whether there exist nonlocal dynamics, and how the length and weight of the ``look-ahead" window affect the spatial temporal propagation of traffic densities. In this paper, we adopt traffic trajectory data from a ring-road experiment and design a physics-informed neural network to learn the fundamental diagram and look-ahead kernel that best fit the data, and reinvent a data-enhanced nonlocal LWR model via minimizing the loss function combining the data discrepancy and the nonlocal model discrepancy. Results show that the learned nonlocal LWR yields a more accurate prediction of traffic wave propagation in three different scenarios: stop-and-go oscillations, congested, and free traffic. We first demonstrate the existence of ``look-ahead" effect with real traffic data. The optimal nonlocal kernel is found out to take a length of around 35 to 50 meters, and the kernel weight within 5 meters accounts for the majority of the nonlocal effect. Our results also underscore the importance of choosing a priori physics in machine learning models.