Manifold distances are very effective tools for visual object recognition. However, most of the traditional manifold distances between images are based on the pixel-level comparison and thus easily affected by image rotations and translations. In this paper, we propose a new manifold distance to model the dissimilarities between visual objects based on the Complex Wavelet Structural Similarity (CW-SSIM) index. The proposed distance is more robust to rotations and translations of images than the traditional manifold distance and the CW-SSIM index based distance. In addition, the proposed distance is combined with the $k$-medoids clustering method to derive a new clustering method for visual object categorization. Experiments on Coil-20, Coil-100 and Olivetti Face Databases show that the proposed distance measure is better for visual object categorization than both the traditional manifold distances and the CW-SSIM index based distances.