Spatio-temporal graph signal analysis has a significant impact on a wide range of applications, including hand/body pose action recognition. To achieve effective analysis, spatio-temporal graph convolutional networks (ST-GCN) leverage the powerful learning ability to achieve great empirical successes; however, those methods need a huge amount of high-quality training data and lack theoretical interpretation. To address this issue, the spatio-temporal graph scattering transform (ST-GST) was proposed to put forth a theoretically interpretable framework; however, the empirical performance of this approach is constrainted by the fully mathematical design. To benefit from both sides, this work proposes a novel complementary mechanism to organically combine the spatio-temporal graph scattering transform and neural networks, resulting in the proposed spatio-temporal graph complementary scattering networks (ST-GCSN). The essence is to leverage the mathematically designed graph wavelets with pruning techniques to cover major information and use trainable networks to capture complementary information. The empirical experiments on hand pose action recognition show that the proposed ST-GCSN outperforms both ST-GCN and ST-GST.