The acoustic and linguistic features of preschool speech are investigated in this study to design an automated speech recognition (ASR) system. Acoustic fluctuation has been highlighted as a significant barrier to developing high-performance ASR applications for youngsters. Because of the epidemic, preschool speech assessment should be conducted online. Accordingly, there is a need for an automatic speech recognition system. We were confronted with new challenges in our cognitive system, including converting meaningless words from speech to text and recognizing word sequence. After testing and experimenting with several models we obtained a 3.1\% phoneme error rate in Persian. Wav2Vec 2.0 is a paradigm that could be used to build a robust end-to-end speech recognition system.