Abstract:This research presents the idea of activity fusion into existing Pose Estimation architectures to enhance their predictive ability. This is motivated by the rise in higher level concepts found in modern machine learning architectures, and the belief that activity context is a useful piece of information for the problem of pose estimation. To analyse this concept we take an existing deep learning architecture and augment it with an additional 1x1 convolution to fuse activity information into the model. We perform evaluation and comparison on a common pose estimation dataset, and show a performance improvement over our baseline model, especially in uncommon poses and on typically difficult joints. Additionally, we perform an ablative analysis to indicate that the performance improvement does in fact draw from the activity information.