Multi-view learning (MVL) is a strategy for fusing data from different sources or subsets. Canonical correlation analysis (CCA) is very important in MVL, whose main idea is to map data from different views onto a common space with the maximum correlation. The traditional CCA can only be used to calculate the linear correlation between two views. Moreover, it is unsupervised, and the label information is wasted in supervised learning tasks. Many nonlinear, supervised, or generalized extensions have been proposed to overcome these limitations. However, to our knowledge, there is no up-to-date overview of these approaches. This paper fills this gap, by providing a comprehensive overview of many classical and latest CCA approaches, and describing their typical applications in pattern recognition, multi-modal retrieval and classification, and multi-view embedding.