Abstract:Recent research in behaviour understanding through language grounding has shown it is possible to automatically generate behaviour models from textual instructions. These models usually have goal-oriented structure and are modelled with different formalisms from the planning domain such as the Planning Domain Definition Language. One major problem that still remains is that there are no benchmark datasets for comparing the different model generation approaches, as each approach is usually evaluated on domain-specific application. To allow the objective comparison of different methods for model generation from textual instructions, in this report we introduce a dataset consisting of 83 textual instructions in English language, their refinement in a more structured form as well as manually developed plans for each of the instructions. The dataset is publicly available to the community.