Within the human-robot interaction (HRI) community, many researchers have focused on the careful design of human-subjects studies. However, other parts of the community, e.g., the technical advances community, also need to do human-subjects studies to collect data to train their models, in ways that require user studies but without a strict experimental design. The design of such data collection is an underexplored area worthy of more attention. In this work, we contribute a clearly defined process to collect data with three steps for machine learning modeling purposes, grounded in recent literature, and detail an use of this process to facilitate the collection of a corpus of referring expressions. Specifically, we discuss our data collection goal and how we worked to encourage well-covered and abundant participant responses, through our design of the task environment, the task itself, and the study procedure. We hope this work would lead to more data collection formalism efforts in the HRI community and a fruitful discussion during the workshop.