Abstract:Handwriting analysis is still an important application in machine learning. A basic requirement for any handwriting recognition application is the availability of comprehensive datasets. Standard labelled datasets play a significant role in training and evaluating learning algorithms. In this paper, we present the Khayyam dataset as another large unconstrained handwriting dataset for elements (words, sentences, letters, digits) of the Persian language. We intentionally concentrated on collecting Persian word samples which are rare in the currently available datasets. Khayyam's dataset contains 44000 words, 60000 letters, and 6000 digits. Moreover, the forms were filled out by 400 native Persian writers. To show the applicability of the dataset, machine learning algorithms are trained on the digits, letters, and word data and results are reported. This dataset is available for research and academic use.