Recent research has shown that integrating artificial intelligence (AI) into wireless communication systems can significantly improve spectral efficiency. However, the prevalent use of simulated radio channel data for training and validating neural network-based radios raises concerns about their generalization capability to diverse real-world environments. To address this, we conducted empirical over-the-air (OTA) experiments using software-defined radio (SDR) technology to test the performance of an NN-based orthogonal frequency division multiplexing (OFDM) receiver in a real-world small cell scenario. Our assessment reveals that the performance of receivers trained on diverse 3GPP TS38.901 channel models and broad parameter ranges significantly surpasses conventional receivers in our testing environment, demonstrating strong generalization to a new environment. Conversely, setting simulation parameters to narrowly reflect the actual measurement environment led to suboptimal OTA performance, highlighting the crucial role of rich and randomized training data in improving the NN-based receiver's performance. While our empirical test results are promising, they also suggest that developing new channel models tailored for training these learned receivers would enhance their generalization capability and reduce training time. Our testing was limited to a relatively narrow environment, and we encourage further testing in more complex environments.