3D skeleton-based motion prediction and activity recognition are two interwoven tasks in human behaviour analysis. In this work, we propose a motion context modeling methodology that provides a new way to combine the advantages of both graph convolutional neural networks and recurrent neural networks for joint human motion prediction and activity recognition. Our approach is based on using an LSTM encoder-decoder and a non-local feature extraction attention mechanism to model the spatial correlation of human skeleton data and temporal correlation among motion frames. The proposed network can easily include two output branches, one for Activity Recognition and one for Future Motion Prediction, which can be jointly trained for enhanced performance. Experimental results on Human 3.6M, CMU Mocap and NTU RGB-D datasets show that our proposed approach provides the best prediction capability among baseline LSTM-based methods, while achieving comparable performance to other state-of-the-art methods.