In this paper, we propose a novel multi-level aggregation network to regress the coordinates of the vertices of a 3D face from a single 2D image in an end-to-end manner. This is achieved by seamlessly combining standard convolutional neural networks (CNNs) with Graph Convolution Networks (GCNs). By iteratively and hierarchically fusing the features across different layers and stages of the CNNs and GCNs, our approach can provide a dense face alignment and 3D face reconstruction simultaneously for the benefit of direct feature learning of 3D face mesh. Experiments on several challenging datasets demonstrate that our method outperforms state-of-the-art approaches on both 2D and 3D face alignment tasks.