Content-based near-duplicate video detection (NDVD) is essential for effective search and retrieval, and robust video fingerprinting is a good solution for NDVD. Most existing video fingerprinting methods use a single feature or concatenating different features to generate video fingerprints, and show a good performance under single-mode modifications such as noise addition and blurring. However, when they suffer combined modifications, the performance is degraded to a certain extent because such features cannot characterize the video content completely. By contrast, the assistance and consensus among different features can improve the performance of video fingerprinting. Therefore, in the present study, we mine the assistance and consensus among different features based on tensor model, and present a new comprehensive feature to fully use them in the proposed video fingerprinting framework. We also analyze what the comprehensive feature really is for representing the original video. In this framework, the video is initially set as a high-order tensor that consists of different features, and the video tensor is decomposed via the Tucker model with a solution that determines the number of components. Subsequently, the comprehensive feature is generated by the low-order tensor obtained from tensor decomposition. Finally, the video fingerprint is computed using this feature. A matching strategy used for narrowing the search is also proposed based on the core tensor. The robust video fingerprinting framework is resistant not only to single-mode modifications, but also to the combination of them.