The value of quick, accurate, and confident diagnoses cannot be undermined to mitigate the effects of COVID-19 infection, particularly for severe cases. Enormous effort has been put towards developing deep learning methods to classify and detect COVID-19 infections from chest radiography images. However, recently some questions have been raised surrounding the clinical viability and effectiveness of such methods. In this work, we carry out extensive experiments on a large COVID-19 chest X-ray dataset to investigate the challenges faced with creating reliable AI solutions from both the data and machine learning perspectives. Accordingly, we offer an in-depth discussion into the challenges faced by some widely-used deep learning architectures associated with chest X-Ray COVID-19 classification. Finally, we include some possible directions and considerations to improve the performance of the models and the data for use in clinical settings.