In acoustic scene classification researches, audio segment is usually split into multiple samples. Majority voting is then utilized to ensemble the results of the samples. In this paper, we propose a punishment voting algorithm based on the super categories construction method for acoustic scene classification. Specifically, we propose a DenseNet-like model as the base classifier. The base classifier is trained by the CQT spectrograms generated from the raw audio segments. Taking advantage of the results of the base classifier, we propose a super categories construction method using the spectral clustering. Super classifiers corresponding to the constructed super categories are further trained. Finally, the super classifiers are utilized to enhance the majority voting of the base classifier by punishment voting. Experiments show that the punishment voting obviously improves the performances on both the DCASE2017 Development dataset and the LITIS Rouen dataset.