In this paper, we tackle the problem of action recognition using body skeletons extracted from video sequences. Our approach lies in the continuity of recent works representing video frames by Gramian matrices that describe a trajectory on the Riemannian manifold of positive-semidefinite matrices of fixed rank. In comparison with previous works, the manifold of fixed-rank positive-semidefinite matrices is here endowed with a different metric, and we resort to different algorithms for the curve fitting and temporal alignment steps. We evaluated our approach on three publicly available datasets (UTKinect-Action3D, KTH-Action and UAV-Gesture). The results of the proposed approach are competitive with respect to state-of-the-art methods, while only involving body skeletons.