Abstract:The advent of industrial robotics and autonomous systems endow human-robot collaboration in a massive scale. However, current industrial robots are restrained in co-working with human in close proximity due to inability of interpreting human agents' attention. Human attention study is non-trivial since it involves multiple aspects of the mind: perception, memory, problem solving, and consciousness. Human attention lapses are particularly problematic and potentially catastrophic in industrial workplace, from assembling electronics to operating machines. Attention is indeed complex and cannot be easily measured with single-modality sensors. Eye state, head pose, posture, and manifold environment stimulus could all play a part in attention lapses. To this end, we propose a pipeline to annotate multimodal dataset of human attention tracking, including eye tracking, fixation detection, third-person surveillance camera, and sound. We produce a pilot dataset containing two fully annotated phone assembly sequences in a realistic manufacturing environment. We evaluate existing fatigue and drowsiness prediction methods for attention lapse detection. Experimental results show that human attention lapses in production scenarios are more subtle and imperceptible than well-studied fatigue and drowsiness.