Abstract:CNN is a powerful tool for many computer vision tasks, achieving much better result than traditional methods. Since CNN has a very large capacity, training such a neural network often requires many data, but it is often expensive to obtain labeled images in real practice, especially for object detection, where collecting bounding box of every object in training set requires many human efforts. This is the case in detection of retail products where there can be many different categories. In this paper, we focus on applying CNN to detect 324-categories products in situ, while requiring no extra effort of labeling bounding box for any image. Our approach is based on an algorithm that extracts bounding box from in-vitro dataset and an algorithm to simulate occlusion. We have successfully shown the effectiveness and usefulness of our methods to build up a Faster RCNN detection model. Similar idea is also applicable in other scenarios.
Abstract:Given the complexity of human minds and their behavioral flexibility, it requires sophisticated data analysis to sift through a large amount of human behavioral evidence to model human minds and to predict human behavior. People currently spend a significant amount of time on social media such as Twitter and Facebook. Thus many aspects of their lives and behaviors have been digitally captured and continuously archived on these platforms. This makes social media a great source of large, rich and diverse human behavioral evidence. In this paper, we survey the recent work on applying machine learning to infer human traits and behavior from social media data. We will also point out several future research directions.
Abstract:In this paper, we demonstrate how the state-of-the-art machine learning and text mining techniques can be used to build effective social media-based substance use detection systems. Since a substance use ground truth is difficult to obtain on a large scale, to maximize system performance, we explore different feature learning methods to take advantage of a large amount of unsupervised social media data. We also demonstrate the benefit of using multi-view unsupervised feature learning to combine heterogeneous user information such as Facebook `"likes" and "status updates" to enhance system performance. Based on our evaluation, our best models achieved 86% AUC for predicting tobacco use, 81% for alcohol use and 84% for drug use, all of which significantly outperformed existing methods. Our investigation has also uncovered interesting relations between a user's social media behavior (e.g., word usage) and substance use.
Abstract:In economics and psychology, delay discounting is often used to characterize how individuals choose between a smaller immediate reward and a larger delayed reward. People with higher delay discounting rate (DDR) often choose smaller but more immediate rewards (a "today person"). In contrast, people with a lower discounting rate often choose a larger future rewards (a "tomorrow person"). Since the ability to modulate the desire of immediate gratification for long term rewards plays an important role in our decision-making, the lower discounting rate often predicts better social, academic and health outcomes. In contrast, the higher discounting rate is often associated with problematic behaviors such as alcohol/drug abuse, pathological gambling and credit card default. Thus, research on understanding and moderating delay discounting has the potential to produce substantial societal benefits.
Abstract:In this paper, we present a study on personalized emphasis framing which can be used to tailor the content of a message to enhance its appeal to different individuals. With this framework, we directly model content selection decisions based on a set of psychologically-motivated domain-independent personal traits including personality (e.g., extraversion and conscientiousness) and basic human values (e.g., self-transcendence and hedonism). We also demonstrate how the analysis results can be used in automated personalized content selection for persuasive message generation.