Abstract:The graph convolutional network has been applied to 3D human pose estimation. In addition, the pure transformer model recently show the promising result in the video-based method. However, the single-frame method still need to model the physically connected relations among joints because the feature representation transformed only by the global attention has the lack of the relationships of human skeleton. We propose a novel architecture to combine the physically connected and global relations among joints in human. We evaluate our method on Human3.6m and compare with the state-of-the-art models. Our model show superior result over all other models. Our model has better generalization ability by cross-dataset comparison on MPI-INF-3DHP.