In multi-camera video surveillance, it is challenging to represent videos from different cameras properly and fuse them efficiently for specific applications such as human activity recognition and clustering. In this paper, a novel representation for multi-camera video data, namely the Product Grassmann Manifold (PGM), is proposed to model video sequences as points on the Grassmann manifold and integrate them as a whole in the product manifold form. Additionally, with a new geometry metric on the product manifold, the conventional Low Rank Representation (LRR) model is extended onto PGM and the new LRR model can be used for clustering non-linear data, such as multi-camera video data. To evaluate the proposed method, a number of clustering experiments are conducted on several multi-camera video datasets of human activity, including Dongzhimen Transport Hub Crowd action dataset, ACT 42 Human action dataset and SKIG action dataset. The experiment results show that the proposed method outperforms many state-of-the-art clustering methods.