The key research question for image manipulation detection is how to learn generalizable features that are sensitive to manipulations in novel data, whilst specific to prevent false alarms on authentic images. Current research emphasizes the sensitivity, with the specificity mostly ignored. In this paper we address both aspects by multi-view feature learning and multi-scale supervision. By exploiting noise distribution and boundary artifacts surrounding tampered regions, the former aims to learn semantic-agnostic and thus more generalizable features. The latter allows us to learn from authentic images which are nontrivial to be taken into account by the prior art that relies on a semantic segmentation loss. Our thoughts are realized by a new network which we term MVSS-Net and its enhanced version MVSS-Net++. Comprehensive experiments on six public benchmark datasets justify the viability of the MVSS-Net series for both pixel-level and image-level manipulation detection.