Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Casey Kennington

Incremental Dialogue Management: Survey, Discussion, and Implications for HRI

Jan 01, 2025

Casey Kennington, Pierre Lison, David Schlangen

Figure 1 for Incremental Dialogue Management: Survey, Discussion, and Implications for HRI

Figure 2 for Incremental Dialogue Management: Survey, Discussion, and Implications for HRI

Figure 3 for Incremental Dialogue Management: Survey, Discussion, and Implications for HRI

Figure 4 for Incremental Dialogue Management: Survey, Discussion, and Implications for HRI

Abstract:Efforts towards endowing robots with the ability to speak have benefited from recent advancements in NLP, in particular large language models. However, as powerful as current models have become, they still operate on sentence or multi-sentence level input, not on the word-by-word input that humans operate on, affecting the degree of responsiveness that they offer, which is critical in situations where humans interact with robots using speech. In this paper, we review the literature on interactive systems that operate incrementally (i.e., at the word level or below it). We motivate the need for incremental systems, survey incremental modeling of important aspects of dialogue like speech recognition and language generation. Primary focus is on the part of the system that makes decisions, known as the dialogue manager. We find that there is very little research on incremental dialogue management, offer some requirements for practical incremental dialogue management, and the implications of incremental dialogue for embodied, robotic platforms.

* 16 pages

Via

Access Paper or Ask Questions

Renaissance: Investigating the Pretraining of Vision-Language Encoders

Nov 11, 2024

Clayton Fields, Casey Kennington

Figure 1 for Renaissance: Investigating the Pretraining of Vision-Language Encoders

Figure 2 for Renaissance: Investigating the Pretraining of Vision-Language Encoders

Figure 3 for Renaissance: Investigating the Pretraining of Vision-Language Encoders

Figure 4 for Renaissance: Investigating the Pretraining of Vision-Language Encoders

Abstract:In the past several years there has been an explosion of available models for vision-language tasks. Unfortunately, the literature still leaves open a number of questions related to best practices in designing and training such models. In this paper we seek to answer several questions related to the pretraining of vision-language encoders through meta-analysis. In our first set of experiments, we show that we can save significant compute at no cost to downstream performance, by freezing large parts of vision-language models during pretraining. In our second set of experiments we examine the effect of basing a VL transformer on a vision model versus a text model. Additionally, we introduce a VL modeling platform called Renaissance that we use to conduct all of the experiments. This program offers a great deal of flexibility in creating, training and evaluating transformer encoders for VL modeling. The source code for Renaissance can be found at https://github.com/bsu-slim/renaissance.

Via

Access Paper or Ask Questions

Unsupervised, Bottom-up Category Discovery for Symbol Grounding with a Curious Robot

Apr 03, 2024

Catherine Henry, Casey Kennington

Figure 1 for Unsupervised, Bottom-up Category Discovery for Symbol Grounding with a Curious Robot

Figure 2 for Unsupervised, Bottom-up Category Discovery for Symbol Grounding with a Curious Robot

Figure 3 for Unsupervised, Bottom-up Category Discovery for Symbol Grounding with a Curious Robot

Figure 4 for Unsupervised, Bottom-up Category Discovery for Symbol Grounding with a Curious Robot

Abstract:Towards addressing the Symbol Grounding Problem and motivated by early childhood language development, we leverage a robot which has been equipped with an approximate model of curiosity with particular focus on bottom-up building of unsupervised categories grounded in the physical world. That is, rather than starting with a top-down symbol (e.g., a word referring to an object) and providing meaning through the application of predetermined samples, the robot autonomously and gradually breaks up its exploration space into a series of increasingly specific unlabeled categories at which point an external expert may optionally provide a symbol association. We extend prior work by using a robot that can observe the visual world, introducing a higher dimensional sensory space, and using a more generalizable method of category building. Our experiments show that the robot learns categories based on actions and what it visually observes, and that those categories can be symbolically grounded into.https://info.arxiv.org/help/prep#comments

* 10 pages

Via

Access Paper or Ask Questions

Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community

Apr 01, 2024

Casey Kennington, Malihe Alikhani, Heather Pon-Barry, Katherine Atwell, Yonatan Bisk, Daniel Fried, Felix Gervits, Zhao Han, Mert Inan, Michael Johnston(+13 more)

Figure 1 for Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community

Figure 2 for Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community

Figure 3 for Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community

Abstract:The ability to interact with machines using natural human language is becoming not just commonplace, but expected. The next step is not just text interfaces, but speech interfaces and not just with computers, but with all machines including robots. In this paper, we chronicle the recent history of this growing field of spoken dialogue with robots and offer the community three proposals, the first focused on education, the second on benchmarks, and the third on the modeling of language when it comes to spoken interaction with robots. The three proposals should act as white papers for any researcher to take and build upon.

* NSF Report on the "Dialogue with Robots" Workshop held in Pittsburg, PA, April 2023

Via

Access Paper or Ask Questions

Understanding Survey Paper Taxonomy about Large Language Models via Graph Representation Learning

Feb 16, 2024

Jun Zhuang, Casey Kennington

Figure 1 for Understanding Survey Paper Taxonomy about Large Language Models via Graph Representation Learning

Figure 2 for Understanding Survey Paper Taxonomy about Large Language Models via Graph Representation Learning

Figure 3 for Understanding Survey Paper Taxonomy about Large Language Models via Graph Representation Learning

Figure 4 for Understanding Survey Paper Taxonomy about Large Language Models via Graph Representation Learning

Abstract:As new research on Large Language Models (LLMs) continues, it is difficult to keep up with new research and models. To help researchers synthesize the new research many have written survey papers, but even those have become numerous. In this paper, we develop a method to automatically assign survey papers to a taxonomy. We collect the metadata of 144 LLM survey papers and explore three paradigms to classify papers within the taxonomy. Our work indicates that leveraging graph structure information on co-category graphs can significantly outperform the language models in two paradigms; pre-trained language models' fine-tuning and zero-shot/few-shot classifications using LLMs. We find that our model surpasses an average human recognition level and that fine-tuning LLMs using weak labels generated by a smaller model, such as the GCN in this study, can be more effective than using ground-truth labels, revealing the potential of weak-to-strong generalization in the taxonomy classification task.

* TL;DR: We collected metadata about LLM surveys and developed a method for categorizing them into a taxonomy, indicating the superiority of graph representation learning over language models and revealing the efficacy of fine-tuning using weak labels

Via

Access Paper or Ask Questions

A Multi-Perspective Learning to Rank Approach to Support Children's Information Seeking in the Classroom

Aug 29, 2023

Garrett Allen, Katherine Landau Wright, Jerry Alan Fails, Casey Kennington, Maria Soledad Pera

Abstract:We introduce a novel re-ranking model that aims to augment the functionality of standard search engines to support classroom search activities for children (ages 6 to 11). This model extends the known listwise learning-to-rank framework by balancing risk and reward. Doing so enables the model to prioritize Web resources of high educational alignment, appropriateness, and adequate readability by analyzing the URLs, snippets, and page titles of Web resources retrieved by a given mainstream search engine. Experimental results, including an ablation study and comparisons with existing baselines, showcase the correctness of the proposed model. The outcomes of this work demonstrate the value of considering multiple perspectives inherent to the classroom setting, e.g., educational alignment, readability, and objectionability, when applied to the design of algorithms that can better support children's information discovery.

* Extended version of the manuscript to appear in proceedings of the 22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology

Via

Access Paper or Ask Questions

On the Computational Modeling of Meaning: Embodied Cognition Intertwined with Emotion

Jul 12, 2023

Casey Kennington

Figure 1 for On the Computational Modeling of Meaning: Embodied Cognition Intertwined with Emotion

Abstract:This document chronicles this author's attempt to explore how words come to mean what they do, with a particular focus on child language acquisition and what that means for models of language understanding.\footnote{I say \emph{historical} because I synthesize the ideas based on when I discovered them and how those ideas influenced my later thinking.} I explain the setting for child language learning, how embodiment -- being able to perceive and enact in the world, including knowledge of concrete and abstract concepts -- is crucial, and how emotion and cognition relate to each other and the language learning process. I end with what I think are some of the requirements for a language-learning agent that learns language in a setting similar to that of children. This paper can act as a potential guide for ongoing and future work in modeling language.

* 18 pages

Via

Access Paper or Ask Questions

Vision Language Transformers: A Survey

Jul 06, 2023

Clayton Fields, Casey Kennington

Figure 1 for Vision Language Transformers: A Survey

Figure 2 for Vision Language Transformers: A Survey

Figure 3 for Vision Language Transformers: A Survey

Figure 4 for Vision Language Transformers: A Survey

Abstract:Vision language tasks, such as answering questions about or generating captions that describe an image, are difficult tasks for computers to perform. A relatively recent body of research has adapted the pretrained transformer architecture introduced in \citet{vaswani2017attention} to vision language modeling. Transformer models have greatly improved performance and versatility over previous vision language models. They do so by pretraining models on a large generic datasets and transferring their learning to new tasks with minor changes in architecture and parameter values. This type of transfer learning has become the standard modeling practice in both natural language processing and computer vision. Vision language transformers offer the promise of producing similar advancements in tasks which require both vision and language. In this paper, we provide a broad synthesis of the currently available research on vision language transformer models and offer some analysis of their strengths, limitations and some open questions that remain.

Via

Access Paper or Ask Questions

Who's in Charge? Roles and Responsibilities of Decision-Making Components in Conversational Robots

Mar 15, 2023

Pierre Lison, Casey Kennington

Abstract:Software architectures for conversational robots typically consist of multiple modules, each designed for a particular processing task or functionality. Some of these modules are developed for the purpose of making decisions about the next action that the robot ought to perform in the current context. Those actions may relate to physical movements, such as driving forward or grasping an object, but may also correspond to communicative acts, such as asking a question to the human user. In this position paper, we reflect on the organization of those decision modules in human-robot interaction platforms. We discuss the relative benefits and limitations of modular vs. end-to-end architectures, and argue that, despite the increasing popularity of end-to-end approaches, modular architectures remain preferable when developing conversational robots designed to execute complex tasks in collaboration with human users. We also show that most practical HRI architectures tend to be either robot-centric or dialogue-centric, depending on where developers wish to place the ``command center'' of their system. While those design choices may be justified in some application domains, they also limit the robot's ability to flexibly interleave physical movements and conversational behaviours. We contend that architectures placing ``action managers'' and ``interaction managers'' on an equal footing may provide the best path forward for future human-robot interaction systems.

* Presented at the HRI 2023 workshop "Human-Robot Conversational Interaction"

Via

Access Paper or Ask Questions

Evaluating Automatic Speech Recognition in an Incremental Setting

Feb 23, 2023

Ryan Whetten, Mir Tahsin Imtiaz, Casey Kennington

Abstract:The increasing reliability of automatic speech recognition has proliferated its everyday use. However, for research purposes, it is often unclear which model one should choose for a task, particularly if there is a requirement for speed as well as accuracy. In this paper, we systematically evaluate six speech recognizers using metrics including word error rate, latency, and the number of updates to already recognized words on English test data, as well as propose and compare two methods for streaming audio into recognizers for incremental recognition. We further propose Revokes per Second as a new metric for evaluating incremental recognition and demonstrate that it provides insights into overall model performance. We find that, generally, local recognizers are faster and require fewer updates than cloud-based recognizers. Finally, we find Meta's Wav2Vec model to be the fastest, and find Mozilla's DeepSpeech model to be the most stable in its predictions.

* 5 pages

Via

Access Paper or Ask Questions