Abstract:With the booming development and popularity of mobile applications, different verticals accumulate abundant data of user information and social behavior, which are spontaneous, genuine and diversified. However, each platform describes user's portraits in only certain aspect, resulting in difficult combination of those internet footprints together. In our research, we proposed a modeling approach to analyze user's online behavior across different social media platforms. Structured and unstructured data of same users shared by NetEase Music and Sina Weibo have been collected for cross-platform analysis of correlations between music preference and other users' characteristics. Based on music tags of genre and mood, genre cluster of five groups and mood cluster of four groups have been formed by computing their collected song lists with K-means method. Moreover, with the help of user data of Weibo, correlations between music preference (i.e. genre, mood) and Big Five personalities (BFPs) and basic information (e.g. gender, resident region, tags) have been comprehensively studied, building up full-scale user portraits with finer grain. Our findings indicate that people's music preference could be linked with their real social activities. For instance, people living in mountainous areas generally prefer folk music, while those in urban areas like pop music more. Interestingly, dog lovers could love sad music more than cat lovers. Moreover, our proposed cross-platform modeling approach could be adapted to other verticals, providing an online automatic way for profiling users in a more precise and comprehensive way.
Abstract:Automatic text generation has received much attention owing to rapid development of deep neural networks. In general, text generation systems based on statistical language model will not consider anthropomorphic characteristics, which results in machine-like generated texts. To fill the gap, we propose a conditional language generation model with Big Five Personality (BFP) feature vectors as input context, which writes human-like short texts. The short text generator consists of a layer of long short memory network (LSTM), where a BFP feature vector is concatenated as one part of input for each cell. To enable supervised training generation model, a text classification model based convolution neural network (CNN) has been used to prepare BFP-tagged Chinese micro-blog corpora. Validated by a BFP linguistic computational model, our generated Chinese short texts exhibit discriminative personality styles, which are also syntactically correct and semantically smooth with appropriate emoticons. With combination of natural language generation with psychological linguistics, our proposed BFP-dependent text generation model can be widely used for individualization in machine translation, image caption, dialogue generation and so on.