Human-robot collaborative assembly systems enhance the efficiency and productivity of the workplace but may increase the workers' cognitive demand. This paper proposes an online and quantitative framework to assess the cognitive workload induced by the interaction with a co-worker, either a human operator or an industrial collaborative robot with different control strategies. The approach monitors the operator's attention distribution and upper-body kinematics benefiting from the input images of a low-cost stereo camera and cutting-edge artificial intelligence algorithms (i.e. head pose estimation and skeleton tracking). Three experimental scenarios with variations in workstation features and interaction modalities were designed to test the performance of our online method against state-of-the-art offline measurements. Results proved that our vision-based cognitive load assessment has the potential to be integrated into the new generation of collaborative robotic technologies. The latter would enable human cognitive state monitoring and robot control strategy adaptation for improving human comfort, ergonomics, and trust in automation.