Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Naval Kishore Mehta

A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction

Jan 10, 2025

Naval Kishore Mehta, Arvind, Himanshu Kumar, Abeer Banerjee, Sumeet Saurav, Sanjay Singh

Figure 1 for A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction

Figure 2 for A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction

Figure 3 for A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction

Figure 4 for A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction

Abstract:Detecting and interpreting operator actions, engagement, and object interactions in dynamic industrial workflows remains a significant challenge in human-robot collaboration research, especially within complex, real-world environments. Traditional unimodal methods often fall short of capturing the intricacies of these unstructured industrial settings. To address this gap, we present a novel Multimodal Industrial Activity Monitoring (MIAM) dataset that captures realistic assembly and disassembly tasks, facilitating the evaluation of key meta-tasks such as action localization, object interaction, and engagement prediction. The dataset comprises multi-view RGB, depth, and Inertial Measurement Unit (IMU) data collected from 22 sessions, amounting to 290 minutes of untrimmed video, annotated in detail for task performance and operator behavior. Its distinctiveness lies in the integration of multiple data modalities and its emphasis on real-world, untrimmed industrial workflows-key for advancing research in human-robot collaboration and operator monitoring. Additionally, we propose a multimodal network that fuses RGB frames, IMU data, and skeleton sequences to predict engagement levels during industrial tasks. Our approach improves the accuracy of recognizing engagement states, providing a robust solution for monitoring operator performance in dynamic industrial environments. The dataset and code can be accessed from https://github.com/navalkishoremehta95/MIAM/.

* Accepted at the 20th International Conference on Human-Robot Interaction (HRI) 2025

Via

Access Paper or Ask Questions

Optimizing Multitask Industrial Processes with Predictive Action Guidance

Jan 09, 2025

Naval Kishore Mehta, Arvind, Shyam Sunder Prasad, Sumeet Saurav, Sanjay Singh

Figure 1 for Optimizing Multitask Industrial Processes with Predictive Action Guidance

Figure 2 for Optimizing Multitask Industrial Processes with Predictive Action Guidance

Figure 3 for Optimizing Multitask Industrial Processes with Predictive Action Guidance

Figure 4 for Optimizing Multitask Industrial Processes with Predictive Action Guidance

Abstract:Monitoring complex assembly processes is critical for maintaining productivity and ensuring compliance with assembly standards. However, variability in human actions and subjective task preferences complicate accurate task anticipation and guidance. To address these challenges, we introduce the Multi-Modal Transformer Fusion and Recurrent Units (MMTFRU) Network for egocentric activity anticipation, utilizing multimodal fusion to improve prediction accuracy. Integrated with the Operator Action Monitoring Unit (OAMU), the system provides proactive operator guidance, preventing deviations in the assembly process. OAMU employs two strategies: (1) Top-5 MMTF-RU predictions, combined with a reference graph and an action dictionary, for next-step recommendations; and (2) Top-1 MMTF-RU predictions, integrated with a reference graph, for detecting sequence deviations and predicting anomaly scores via an entropy-informed confidence mechanism. We also introduce Time-Weighted Sequence Accuracy (TWSA) to evaluate operator efficiency and ensure timely task completion. Our approach is validated on the industrial Meccano dataset and the largescale EPIC-Kitchens-55 dataset, demonstrating its effectiveness in dynamic environments.

Via

Access Paper or Ask Questions