Abstract:Response process data collected from human-computer interactive items contain rich information about respondents' behavioral patterns and cognitive processes. Their irregular formats as well as their large sizes make standard statistical tools difficult to apply. This paper develops a computationally efficient method for exploratory analysis of such process data. The new approach segments a lengthy individual process into a sequence of short subprocesses to achieve complexity reduction, easy clustering and meaningful interpretation. Each subprocess is considered a subtask. The segmentation is based on sequential action predictability using a parsimonious predictive model combined with the Shannon entropy. Simulation studies are conducted to assess performance of the new methods. We use the process data from PIAAC 2012 to demonstrate how exploratory analysis of process data can be done with the new approach.
Abstract:Process data refer to data recorded in the log files of computer-based items. These data, represented as timestamped action sequences, keep track of respondents' response processes of solving the items. Process data analysis aims at enhancing educational assessment accuracy and serving other assessment purposes by utilizing the rich information contained in response processes. The R package ProcData presented in this article is designed to provide tools for processing, describing, and analyzing process data. We define an S3 class "proc" for organizing process data and extend generic methods summary and print for class "proc". Two feature extraction methods for process data are implemented in the package for compressing information in the irregular response processes into regular numeric vectors. ProcData also provides functions for fitting and making predictions from a neural-network-based sequence model. These functions call relevant functions in package keras for constructing and training neural networks. In addition, several response process generators and a real dataset of response processes of the climate control item in the 2012 Programme for International Student Assessment are included in the package.
Abstract:Computer simulations have become a popular tool of assessing complex skills such as problem-solving skills. Log files of computer-based items record the entire human-computer interactive processes for each respondent. The response processes are very diverse, noisy, and of nonstandard formats. Few generic methods have been developed for exploiting the information contained in process data. In this article, we propose a method to extract latent variables from process data. The method utilizes a sequence-to-sequence autoencoder to compress response processes into standard numerical vectors. It does not require prior knowledge of the specific items and human-computers interaction patterns. The proposed method is applied to both simulated and real process data to demonstrate that the resulting latent variables extract useful information from the response processes.