Compound Expression Recognition (CER) plays a crucial role in interpersonal interactions. Due to the existence of Compound Expressions , human emotional expressions are complex, requiring consideration of both local and global facial expressions to make judgments. In this paper, to address this issue, we propose a solution based on ensemble learning methods for Compound Expression Recognition. Specifically, our task is classification, where we train three expression classification models based on convolutional networks, Vision Transformers, and multi-scale local attention networks. Then, through model ensemble using late fusion, we merge the outputs of multiple models to predict the final result. Our method achieves high accuracy on RAF-DB and is able to recognize expressions through zero-shot on certain portions of C-EXPR-DB.