Multi-view subspace clustering always performs well in high-dimensional data analysis, but is sensitive to the quality of data representation. To this end, a two stage fusion strategy is proposed to embed representation learning into the process of multi-view subspace clustering. This paper first propose a novel matrix factorization method that can separate the coupling consistent and complementary information from observations of multiple views. Based on the obtained latent representations, we further propose two subspace clustering strategies: feature-level fusion and subspace-level hierarchical strategy. Feature-level method concatenates all kinds of latent representations from multiple views, and the original problem therefore degenerates to a single-view subspace clustering process. Subspace-level hierarchical method performs different self-expressive reconstruction processes on the corresponding complementary and consistent latent representations coming from each view, i.e. the prior constraints imposed on different types of subspace representations are related to the appropriate input factors. Finally, extensive experimental results on real-world datasets demonstrate the superiority of our proposed methods by comparing against some state-of-the-art subspace clustering algorithms.