Abstract:Objective criteria for universal semantic components that distinguish a humorous utterance from a non-humorous one are presently under debate. In this article, we give an in-depth observation of our system of self-paced reading for annotation of humor, that collects readers' annotations while they open a text word by word. The system registers keys that readers press to open the next word, choose a class (humorous versus non-humorous texts), change their choice. We also touch upon our psycho-linguistic experiment conducted with the system and the data collected during it.
Abstract:The article describes a fast solution to propaganda detection at SemEval-2020 Task 11, based onfeature adjustment. We use per-token vectorization of features and a simple Logistic Regressionclassifier to quickly test different hypotheses about our data. We come up with what seems to usthe best solution, however, we are unable to align it with the result of the metric suggested by theorganizers of the task. We test how our system handles class and feature imbalance by varying thenumber of samples of two classes (Propaganda and None) in the training set, the size of a contextwindow in which a token is vectorized and combination of vectorization means. The result of oursystem at SemEval2020 Task 11 is F-score=0.37.
Abstract:Being a matter of cognition, user interests should be apt to classification independent of the language of users, social network and content of interest itself. To prove it, we analyze a collection of English and Russian Twitter and Vkontakte community pages by interests of their followers. First, we create a model of Major Interests (MaIs) with the help of expert analysis and then classify a set of pages using machine learning algorithms (SVM, Neural Network, Naive Bayes, and some other). We take three interest domains that are typical of both English and Russian-speaking communities: football, rock music, vegetarianism. The results of classification show a greater correlation between Russian-Vkontakte and Russian-Twitter pages while English-Twitterpages appear to provide the highest score.