Abstract:Forensic analysts often use social media imagery and texts to understand important events. A primary challenge is the initial sifting of irrelevant posts. This work introduces an interactive process for training an event-centric, learning-based multimodal classification model that automates sanitization. We propose a method based on Bayesian Graph Neural Networks (BGNNs) and evaluate active learning and pseudo-labeling formulations to reduce the number of posts the analyst must manually annotate. Our results indicate that BGNNs are useful for social-media data sifting for forensics investigations of events of interest, the value of active learning and pseudo-labeling varies based on the setting, and incorporating unlabelled data from other events improves performance.
Abstract:Social media has become an important data source for event analysis. When collecting this type of data, most contain no useful information to a target event. Thus, it is essential to filter out those noisy data at the earliest opportunity for a human expert to perform further inspection. Most existing solutions for event filtering rely on fully supervised methods for training. However, in many real-world scenarios, having access to large number of labeled samples is not possible. To deal with a few labeled sample training problem for event filtering, we propose a graph-based few-shot learning pipeline. We also release the Brazilian Protest Dataset to test our method. To the best of our knowledge, this dataset is the first of its kind in event filtering that focuses on protests in multi-modal social media data, with most of the text in Portuguese. Our experimental results show that our proposed pipeline has comparable performance with only a few labeled samples (60) compared with a fully labeled dataset (3100). To facilitate the research community, we make our dataset and code available at https://github.com/jdnascim/7Set-AL.