Abstract:6D object pose tracking has been extensively studied in the robotics and computer vision communities. The most promising solutions, leveraging on deep neural networks and/or filtering and optimization, exhibit notable performance on standard benchmarks. However, to our best knowledge, these have not been tested thoroughly against fast object motions. Tracking performance in this scenario degrades significantly, especially for methods that do not achieve real-time performance and introduce non negligible delays. In this work, we introduce ROFT, a Kalman filtering approach for 6D object pose and velocity tracking from a stream of RGB-D images. By leveraging real-time optical flow, ROFT synchronizes delayed outputs of low frame rate Convolutional Neural Networks for instance segmentation and 6D object pose estimation with the RGB-D input stream to achieve fast and precise 6D object pose and velocity tracking. We test our method on a newly introduced photorealistic dataset, Fast-YCB, which comprises fast moving objects from the YCB model set, and on the dataset for object and hand pose estimation HO-3D. Results demonstrate that our approach outperforms state-of-the-art methods for 6D object pose tracking, while also providing 6D object velocity tracking. A video showing the experiments is provided as supplementary material.