In this paper, we propose to detect facial action units (AU) using 3D facial landmarks. Specifically, we train a 2D convolutional neural network (CNN) on 3D facial landmarks, tracked using a shape index-based statistical shape model, for binary and multi-class AU detection. We show that the proposed approach is able to accurately model AU occurrences, as the movement of the facial landmarks corresponds directly to the movement of the AUs. By training a CNN on 3D landmarks, we can achieve accurate AU detection on two state-of-the-art emotion datasets, namely BP4D and BP4D+. Using the proposed method, we detect multiple AUs on over 330,000 frames, reporting improved results over state-of-the-art methods.