Abstract:Warning: This paper contains memes that may be offensive to some readers. Multimodal Internet Memes are now a ubiquitous fixture in online discourse. One strand of meme-based research is the classification of memes according to various affects, such as sentiment and hate, supported by manually compiled meme datasets. Understanding the unique characteristics of memes is crucial for meme classification. Unlike other user-generated content, memes spread via memetics, i.e. the process by which memes are imitated and transformed into symbols used to create new memes. In effect, there exists an ever-evolving pool of visual and linguistic symbols that underpin meme culture and are crucial to interpreting the meaning of individual memes. The current approach of training supervised learning models on static datasets, without taking memetics into account, limits the depth and accuracy of meme interpretation. We argue that meme datasets must contain genuine memes, as defined via memetics, so that effective meme classifiers can be built. In this work, we develop a meme identification protocol which distinguishes meme from non-memetic content by recognising the memetics within it. We apply our protocol to random samplings of the leading 7 meme classification datasets and observe that more than half (50. 4\%) of the evaluated samples were found to contain no signs of memetics. Our work also provides a meme typology grounded in memetics, providing the basis for more effective approaches to the interpretation of memes and the creation of meme datasets.
Abstract:Internet Memes remain a challenging form of user-generated content for automated sentiment classification. The availability of labelled memes is a barrier to developing sentiment classifiers of multimodal memes. To address the shortage of labelled memes, we propose to supplement the training of a multimodal meme classifier with unimodal (image-only and text-only) data. In this work, we present a novel variant of supervised intermediate training that uses relatively abundant sentiment-labelled unimodal data. Our results show a statistically significant performance improvement from the incorporation of unimodal text data. Furthermore, we show that the training set of labelled memes can be reduced by 40% without reducing the performance of the downstream model.
Abstract:Internet memes are characterised by the interspersing of text amongst visual elements. State-of-the-art multimodal meme classifiers do not account for the relative positions of these elements across the two modalities, despite the latent meaning associated with where text and visual elements are placed. Against two meme sentiment classification datasets, we systematically show performance gains from incorporating the spatial position of visual objects, faces, and text clusters extracted from memes. In addition, we also present facial embedding as an impactful enhancement to image representation in a multimodal meme classifier. Finally, we show that incorporating this spatial information allows our fully automated approaches to outperform their corresponding baselines that rely on additional human validation of OCR-extracted text.