Abstract:People are regularly confronted with potentially deceptive statements (e.g., fake news, misleading product reviews, or lies about activities). Only few works on automated text-based deception detection have exploited the potential of deep learning approaches. A critique of deep-learning methods is their lack of interpretability, preventing us from understanding the underlying (linguistic) mechanisms involved in deception. However, recent advancements have made it possible to explain some aspects of such models. This paper proposes and evaluates six deep-learning models, including combinations of BERT (and RoBERTa), MultiHead Attention, co-attentions, and transformers. To understand how the models reach their decisions, we then examine the model's predictions with LIME. We then zoom in on vocabulary uniqueness and the correlation of LIWC categories with the outcome class (truthful vs deceptive). The findings suggest that our transformer-based models can enhance automated deception detection performances (+2.11% in accuracy) and show significant differences pertinent to the usage of LIWC features in truthful and deceptive statements.
Abstract:The popularity of online shopping is steadily increasing. At the same time, fake product reviewsare published widely and have the potential to affect consumer purchasing behavior. In response,previous work has developed automated methods for the detection of deceptive product reviews.However, studies vary considerably in terms of classification performance, and many use data thatcontain potential confounds, which makes it difficult to determine their validity. Two possibleconfounds are data-origin (i.e., the dataset is composed of more than one source) and productownership (i.e., reviews written by individuals who own or do not own the reviewed product). Inthe present study, we investigate the effect of both confounds for fake review detection. Using anexperimental design, we manipulate data-origin, product ownership, review polarity, and veracity.Supervised learning analysis suggests that review veracity (60.26 - 69.87%) is somewhat detectablebut reviews additionally confounded with product-ownership (66.19 - 74.17%), or with data-origin(84.44 - 86.94%) are easier to classify. Review veracity is most easily classified if confounded withproduct-ownership and data-origin combined (87.78 - 88.12%), suggesting overestimations of thetrue performance in other work. These findings are moderated by review polarity.