Abstract:In crowd behavior understanding, a model of crowd behavior need to be trained using the information extracted from video sequences. Since there is no ground-truth available in crowd datasets except the crowd behavior labels, most of the methods proposed so far are just based on low-level visual features. However, there is a huge semantic gap between low-level motion/appearance features and high-level concept of crowd behaviors. In this paper we propose an attribute-based strategy to alleviate this problem. While similar strategies have been recently adopted for object and action recognition, as far as we know, we are the first showing that the crowd emotions can be used as attributes for crowd behavior understanding. The main idea is to train a set of emotion-based classifiers, which can subsequently be used to represent the crowd motion. For this purpose, we collect a big dataset of video clips and provide them with both annotations of "crowd behaviors" and "crowd emotions". We show the results of the proposed method on our dataset, which demonstrate that the crowd emotions enable the construction of more descriptive models for crowd behaviors. We aim at publishing the dataset with the article, to be used as a benchmark for the communities.