Abstract:In many fields of experimental science, papers that failed to replicate continue to be cited as a result of the poor discoverability of replication studies. As a first step to creating a system that automatically finds replication studies for a given paper, 334 replication studies and 344 replicated studies were collected. Replication studies could be identified in the dataset based on text content at a higher rate than chance (AUROC = 0.886). Additionally, successful replication studies could be distinguished from failed replication studies at a higher rate than chance (AUROC = 0.664).
Abstract:The goal of this study was to improve the post-processing of precipitation forecasts using convolutional neural networks (CNNs). Instead of post-processing forecasts on a per-pixel basis, as is usually done when employing machine learning in meteorological post-processing, input forecast images were combined and transformed into probabilistic output forecast images using fully convolutional neural networks. CNNs did not outperform regularized logistic regression. Additionally, an ablation analysis was performed. Combining input forecasts from a global low-resolution weather model and a regional high-resolution weather model improved performance over either one.
Abstract:Detecting deception in natural language has a wide variety of applications, but because of its hidden nature there are no public, large-scale sources of labeled deceptive text. This work introduces the Mafiascum dataset [1], a collection of over 700 games of Mafia, in which players are randomly assigned either deceptive or non-deceptive roles and then interact via forum postings. Almost 10,000 documents were compiled from the dataset, which each contained all messages written by a single player in a single game. This corpus was used to construct a set of hand-picked linguistic features based on prior deception research and a set of average word vectors enriched with subword information. An SVM classifier fit on a combination of these feature sets achieved an area under the precision-recall curve of 0.35 (chance = 0.26) and an ROC AUC of 0.64 (chance = 0.50). [1] https://bitbucket.org/bopjesvla/thesis/src