The past few years has witnessed the dominance of Graph Convolutional Networks (GCNs) over human motion prediction, while their performance is still far from satisfactory. Recently, MLP-Mixers show competitive results on top of being more efficient and simple. To extract features, GCNs typically follow an aggregate-and-update paradigm, while Mixers rely on token mixing and channel mixing operations. The two research paths have been independently established in the community. In this paper, we develop a novel perspective by unifying Mixers and GCNs. We show that a mixer layer can be seen as a graph convolutional layer applied to a fully-connected graph with parameterized adjacency. Extending this theoretical finding to the practical side, we propose Meta-Mixing Network (M$^2$-Net). Assisted with a novel zero aggregation operation, our network is capable of capturing both the structure-agnostic and the structure-sensitive dependencies in a collaborative manner. Not only is it computationally efficient, but most importantly, it also achieves state-of-the-art performance. An extensive evaluation on the Human3.6M, AMASS, and 3DPW datasets shows that M$^2$-Net consistently outperforms all other approaches. We hope our work brings the community one step further towards truly predictable human motion. Our code will be publicly available.