Abstract:Achieving both high speed and precision in robot operations is a significant challenge for social implementation. While factory robots excel at predefined tasks, they struggle with environment-specific actions like cleaning and cooking. Deep learning research aims to address this by enabling robots to autonomously execute behaviors through end-to-end learning with sensor data. RT-1 and ACT are notable examples that have expanded robots' capabilities. However, issues with model inference speed and hand position accuracy persist. High-quality training data and fast, stable inference mechanisms are essential to overcome these challenges. This paper proposes a motion generation model for high-speed, high-precision tasks, exemplified by the sports stacking task. By teaching motions slowly and inferring at high speeds, the model achieved a 94% success rate in stacking cups with a real robot.