Abstract:Classifying the sub-categories of an object from the same super-category (e.g., bird) in a fine-grained visual classification (FGVC) task highly relies on mining multiple discriminative features. Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion. In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB), in combination with two alternative correlation metrics, to address this problem. The key idea is to randomly mask out a group of correlated channels during training to destruct features from co-adaptations and thus enhance feature representations. Extensive experiments on three benchmark FGVC datasets show that CDB effectively improves the performance.