This paper proposes a novel 4D Facial Expression Recognition (FER) method using Collaborative Cross-domain Dynamic Image Network (CCDN). Given a 4D data of face scans, we first compute its geometrical images, and then combine their correlated information in the proposed cross-domain image representations. The acquired set is then used to generate cross-domain dynamic images (CDI) via rank pooling that encapsulates facial deformations over time in terms of a single image. For the training phase, these CDIs are fed into an end-to-end deep learning model, and the resultant predictions collaborate over multi-views for performance gain in expression classification. Furthermore, we propose a 4D augmentation scheme that not only expands the training data scale but also introduces significant facial muscle movement patterns to improve the FER performance. Results from extensive experiments on the commonly used BU-4DFE dataset under widely adopted settings show that our proposed method outperforms the state-of-the-art 4D FER methods by achieving an accuracy of 96.5% indicating its effectiveness.