Abstract:Human Action Recognition (HAR) is a very crucial task in computer vision. It helps to carry out a series of downstream tasks, like understanding human behaviors. Due to the complexity of human behaviors, many highly valuable behaviors are not yet encompassed within the available datasets for HAR, e.g., human habitual behaviors (HHBs). HHBs hold significant importance for analyzing a person's personality, habits, and psychological changes. To solve these problems, in this work, we build a novel video dataset to demonstrate various HHBs. These behaviors in the proposed dataset are able to reflect internal mental states and specific emotions of the characters, e.g., crossing arms suggests to shield oneself from perceived threats. The dataset contains 30 categories of habitual behaviors including more than 300,000 frames and 6,899 action instances. Since these behaviors usually appear at small local parts of human action videos, it is difficult for existing action recognition methods to handle these local features. Therefore, we also propose a two-stream model using both human skeletons and RGB appearances. Experimental results demonstrate that our proposed method has much better performance in action recognition than the existing methods on the proposed dataset.