Abstract:Audience interest, demography, purchase behavior and other possible classifications are ex- tremely important factors to be carefully studied in a targeting campaign. This information can help advertisers and publishers deliver advertisements to the right audience group. How- ever, it is not easy to collect such information, especially for the online audience with whom we have limited interaction and minimum deterministic knowledge. In this paper, we pro- pose a predictive framework that can estimate online audience demographic attributes based on their browsing histories. Under the proposed framework, first, we retrieve the content of the websites visited by audience, and represent the content as website feature vectors; second, we aggregate the vectors of websites that audience have visited and arrive at feature vectors representing the users; finally, the support vector machine is exploited to predict the audience demographic attributes. The key to achieving good prediction performance is preparing representative features of the audience. Word Embedding, a widely used tech- nique in natural language processing tasks, together with term frequency-inverse document frequency weighting scheme is used in the proposed method. This new representation ap- proach is unsupervised and very easy to implement. The experimental results demonstrate that the new audience feature representation method is more powerful than existing baseline methods, leading to a great improvement in prediction accuracy.