Abstract:This paper presents a novel hybrid algorithm designed to interpret natural human commands in tabletop scenarios. By integrating multiple sources of information, including speech, gestures, and scene context, the system extracts actionable instructions for a robot, identifying relevant objects and actions. The system operates in a zero-shot fashion, without reliance on predefined object models, enabling flexible and adaptive use in various environments. We assess the integration of multiple deep learning models, evaluating their suitability for deployment in real-world robotic setups. Our algorithm performs robustly across different tasks, combining language processing with visual grounding. In addition, we release a small dataset of video recordings used to evaluate the system. This dataset captures real-world interactions in which a human provides instructions in natural language to a robot, a contribution to future research on human-robot interaction. We discuss the strengths and limitations of the system, with particular focus on how it handles multimodal command interpretation, and its ability to be integrated into symbolic robotic frameworks for safe and explainable decision-making.
Abstract:Sustainability is no longer a matter of choice but is invariably linked to the survival of the entire ecosystem of our planet Earth. As robotics technology is growing at an exponential rate, it is crucial to examine its implications for sustainability. Our focus is on social sustainability, specifically analyzing the role of robotics technology in this domain by identifying six distinct ways robots influence social sustainability.
Abstract:As more and more social robots are being used for collaborative activities with humans, it is crucial to investigate mechanisms to facilitate trust in the human-robot interaction. One such mechanism is humour: it has been shown to increase creativity and productivity in human-human interaction, which has an indirect influence on trust. In this study, we investigate if humour can increase trust in human-robot interaction. We conducted a between-subjects experiment with 40 participants to see if the participants are more likely to accept the robot's suggestion in the Three-card Monte game, as a trust check task. Though we were unable to find a significant effect of humour, we discuss the effect of possible confounding variables, and also report some interesting qualitative observations from our study: for instance, the participants interacted effectively with the robot as a team member, regardless of the humour or no-humour condition.
Abstract:Spoken language is the most natural way for a human to communicate with a robot. It may seem intuitive that a robot should communicate with users in their native language. However, it is not clear if a user's perception of a robot is affected by the language of interaction. We investigated this question by conducting a study with twenty-three native Czech participants who were also fluent in English. The participants were tasked with instructing the Pepper robot on where to place objects on a shelf. The robot was controlled remotely using the Wizard-of-Oz technique. We collected data through questionnaires, video recordings, and a post-experiment feedback session. The results of our experiment show that people perceive an English-speaking robot as more intelligent than a Czech-speaking robot (z = 18.00, p-value = 0.02). This finding highlights the influence of language on human-robot interaction. Furthermore, we discuss the feedback obtained from the participants via the post-experiment sessions and its implications for HRI design.
Abstract:As child-robot interactions become more and more common in daily life environment, it is important to examine how robot's errors influence children's behavior. We explored how a robot's unexpected behaviors affect child-robot interactions during two workshops on active reading: one in a modern art museum and one in a school. We observed the behavior and attitudes of 42 children from three age groups: 6-7 years, 8-10 years, and 10-12 years. Through our observations, we identified six different types of surprising robot behaviors: personality, movement malfunctions, inconsistent behavior, mispronunciation, delays, and freezing. Using a qualitative analysis, we examined how children responded to each type of behavior, and we observed similarities and differences between the age groups. Based on our findings, we propose guidelines for designing age-appropriate learning interactions with social robots.
Abstract:Telling lies and faking emotions is quite common in human-human interactions: though there are risks, in many situations such behaviours provide social benefits. In recent years, there have been many social robots and chatbots that fake emotions or behave deceptively with their users. In this paper, I present a few examples of such robots and chatbots, and analyze their ethical aspects. Three scenarios are presented where some kind of lying or deceptive behaviour might be justified. Then five approaches to deceptive behaviours - no deception, blatant deception, tactful deception, nudging, and self deception - are discussed and their implications are analyzed. I conclude by arguing that we need to develop localized and culture-specific solutions to incorporating deception in social robots and chatbots.
Abstract:More and more stores in Poland are adopting robots as customer assistants or promotional tools. However, customer attitudes to such novelty remain unexplored. This study focused on the role of social robots in self-service cafes. This domain has not been explored in Poland before, and there is not much research in other countries as well. We conducted a field study in two cafes with a teleoperated robot Nao, which sat next to the counter serving as an assistant to a human barista. We observed customer behavior, conducted semi-structured interviews and questionnaires with the customers. The results show that Polish customers are neutral and insecure about robots. However, they do not exhibit a total dislike of these technologies. We considered three stages of the interaction and identified features of each stage that need to be designed carefully to yield user satisfaction.
Abstract:Skeleton Ground Truth (GT) is critical to the success of supervised skeleton extraction methods, especially with the popularity of deep learning techniques. Furthermore, we see skeleton GTs used not only for training skeleton detectors with Convolutional Neural Networks (CNN) but also for evaluating skeleton-related pruning and matching algorithms. However, most existing shape and image datasets suffer from the lack of skeleton GT and inconsistency of GT standards. As a result, it is difficult to evaluate and reproduce CNN-based skeleton detectors and algorithms on a fair basis. In this paper, we present a heuristic strategy for object skeleton GT extraction in binary shapes and natural images. Our strategy is built on an extended theory of diagnosticity hypothesis, which enables encoding human-in-the-loop GT extraction based on clues from the target's context, simplicity, and completeness. Using this strategy, we developed a tool, SkeView, to generate skeleton GT of 17 existing shape and image datasets. The GTs are then structurally evaluated with representative methods to build viable baselines for fair comparisons. Experiments demonstrate that GTs generated by our strategy yield promising quality with respect to standard consistency, and also provide a balance between simplicity and completeness.
Abstract:The text-independent approach to writer identification does not require the writer to write some predetermined text. Previous research on text-independent writer identification has been based on identifying writer-specific features designed by experts. However, in the last decade, deep learning methods have been successfully applied to learn features from data automatically. We propose here an end-to-end deep-learning method for text-independent writer identification that does not require prior identification of features. A Convolutional Neural Network (CNN) is trained initially to extract local features, which represent characteristics of individual handwriting in the whole character images and their sub-regions. Randomly sampled tuples of images from the training set are used to train the CNN and aggregate the extracted local features of images from the tuples to form global features. For every training epoch, the process of randomly sampling tuples is repeated, which is equivalent to a large number of training patterns being prepared for training the CNN for text-independent writer identification. We conducted experiments on the JEITA-HP database of offline handwritten Japanese character patterns. With 200 characters, our method achieved an accuracy of 99.97% to classify 100 writers. Even when using 50 characters for 100 writers or 100 characters for 400 writers, our method achieved accuracy levels of 92.80% or 93.82%, respectively. We conducted further experiments on the Firemaker and IAM databases of offline handwritten English text. Using only one page per writer to train, our method achieved over 91.81% accuracy to classify 900 writers. Overall, we achieved a better performance than the previously published best result based on handcrafted features and clustering algorithms, which demonstrates the effectiveness of our method for handwritten English text also.
Abstract:Recent progress in machine learning techniques have revived interest in building artificial general intelligence using these particular tools. There has been a tremendous success in applying them for narrow intellectual tasks such as pattern recognition, natural language processing and playing Go. The latter application vastly outperforms the strongest human player in recent years. However, these tasks are formalized by people in such ways that it has become "easy" for automated recipes to find better solutions than humans do. In the sense of John Searle's Chinese Room Argument, the computer playing Go does not actually understand anything from the game. Thinking like a human mind requires to go beyond the curve fitting paradigm of current systems. There is a fundamental limit to what they can achieve currently as only very specific problem formalization can increase their performances in particular tasks. In this paper, we argue than one of the most important aspects of the human mind is its capacity for logical thinking, which gives rise to many intellectual expressions that differentiate us from animal brains. We propose to model the emergence of logical thinking based on Piaget's theory of cognitive development.