Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nikolai Ilinykh

Coreference as an indicator of context scope in multimodal narrative

Mar 07, 2025

Nikolai Ilinykh, Shalom Lappin, Asad Sayeed, Sharid Loáiciga

Abstract:We demonstrate that large multimodal language models differ substantially from humans in the distribution of coreferential expressions in a visual storytelling task. We introduce a number of metrics to quantify the characteristics of coreferential patterns in both human- and machine-written texts. Humans distribute coreferential expressions in a way that maintains consistency across texts and images, interleaving references to different entities in a highly varied way. Machines are less able to track mixed references, despite achieving perceived improvements in generation quality.

* 20 pages, 4 tables

Via

Access Paper or Ask Questions

Describe me an Aucklet: Generating Grounded Perceptual Category Descriptions

Mar 08, 2023

Bill Noble, Nikolai Ilinykh

Abstract:Human language users can generate descriptions of perceptual concepts beyond instance-level representations and also use such descriptions to learn provisional class-level representations. However, the ability of computational models to learn and operate with class representations is under-investigated in the language-and-vision field. In this paper, we train separate neural networks to generate and interpret class-level descriptions. We then use the zero-shot classification performance of the interpretation model as a measure of communicative success and class-level conceptual grounding. We investigate the performance of prototype- and exemplar-based neural representations grounded category description. Finally, we show that communicative success reveals performance issues in the generation model that are not captured by traditional intrinsic NLG evaluation metrics, and argue that these issues can be traced to a failure to properly ground language in vision at the class level. We observe that the interpretation model performs better with descriptions that are low in diversity on the class level, possibly indicating a strong reliance on frequently occurring features.

Via

Access Paper or Ask Questions

We went to look for meaning and all we got were these lousy representations: aspects of meaning representation for computational semantics

Sep 10, 2021

Simon Dobnik, Robin Cooper, Adam Ek, Bill Noble, Staffan Larsson, Nikolai Ilinykh, Vladislav Maraev, Vidya Somashekarappa

Abstract:In this paper we examine different meaning representations that are commonly used in different natural language applications today and discuss their limits, both in terms of the aspects of the natural language meaning they are modelling and in terms of the aspects of the application for which they are used.

* 14 pages

Via

Access Paper or Ask Questions

MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment

Jul 11, 2019

Nikolai Ilinykh, Sina Zarrieß, David Schlangen

Figure 1 for MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment

Figure 2 for MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment

Figure 3 for MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment

Figure 4 for MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment

Abstract:Building computer systems that can converse about their visual environment is one of the oldest concerns of research in Artificial Intelligence and Computational Linguistics (see, for example, Winograd's 1972 SHRDLU system). Only recently, however, have methods from computer vision and natural language processing become powerful enough to make this vision seem more attainable. Pushed especially by developments in computer vision, many data sets and collection environments have recently been published that bring together verbal interaction and visual processing. Here, we argue that these datasets tend to oversimplify the dialogue part, and we propose a task---MeetUp!---that requires both visual and conversational grounding, and that makes stronger demands on representations of the discourse. MeetUp! is a two-player coordination game where players move in a visual environment, with the objective of finding each other. To do so, they must talk about what they see, and achieve mutual understanding. We describe a data collection and show that the resulting dialogues indeed exhibit the dialogue phenomena of interest, while also challenging the language & vision aspect.

* In Proceedings of the 23rd Workshop on the Semantics and Pragmatics of Dialogue (semdial / LondonLogue), London, September 2019

Via

Access Paper or Ask Questions