Abstract:Brain-Computer Interfaces (BCIs) can decode imagined speech from neural activity. However, these systems typically require extensive training sessions where participants imaginedly repeat words, leading to mental fatigue and difficulties identifying the onset of words, especially when imagining sequences of words. This paper addresses these challenges by transferring a classifier trained in overt speech data to covert speech classification. We used electroencephalogram (EEG) features derived from the Hilbert envelope and temporal fine structure, and used them to train a bidirectional long-short-term memory (BiLSTM) model for classification. Our method reduces the burden of extensive training and achieves state-of-the-art classification accuracy: 86.44% for overt speech and 79.82% for covert speech using the overt speech classifier.
Abstract:The 'keyword method' is an effective technique for learning vocabulary of a foreign language. It involves creating a memorable visual link between what a word means and what its pronunciation in a foreign language sounds like in the learner's native language. However, these memorable visual links remain implicit in the people's mind and are not easy to remember for a large set of words. To enhance the memorisation and recall of the vocabulary, we developed an application that combines the keyword method with text-to-image generators to externalise the memorable visual links into visuals. These visuals represent additional stimuli during the memorisation process. To explore the effectiveness of this approach we first run a pilot study to investigate how difficult it is to externalise the descriptions of mental visualisations of memorable links, by asking participants to write them down. We used these descriptions as prompts for text-to-image generator (DALL-E2) to convert them into images and asked participants to select their favourites. Next, we compared different text-to-image generators (DALL-E2, Midjourney, Stable and Latent Diffusion) to evaluate the perceived quality of the generated images by each. Despite heterogeneous results, participants mostly preferred images generated by DALL-E2, which was used also for the final study. In this study, we investigated whether providing such images enhances the retention of vocabulary being learned, compared to the keyword method only. Our results indicate that people did not encounter difficulties describing their visualisations of memorable links and that providing corresponding images significantly improves memory retention.
Abstract:Visual art (VA) recommendation is complex, as it has to consider the interests of users (e.g. museum visitors) and other stakeholders (e.g. museum curators). We study how to effectively account for key stakeholders in VA recommendations while also considering user-centred measures such as novelty, serendipity, and diversity. We propose MOSAIC, a novel multimodal multistakeholder-aware approach using state-of-the-art CLIP and BLIP backbone architectures and two joint optimisation objectives: popularity and representative selection of paintings across different categories. We conducted an offline evaluation using preferences elicited from 213 users followed by a user study with 100 crowdworkers. We found a strong effect of popularity, which was positively perceived by users, and a minimal effect of representativeness. MOSAIC's impact extends beyond visitors, benefiting various art stakeholders. Its user-centric approach has broader applicability, offering advancements for content recommendation across domains that require considering multiple stakeholders.
Abstract:Present Brain-Computer Interfacing (BCI) technology allows inference and detection of cognitive and affective states, but fairly little has been done to study scenarios in which such information can facilitate new applications that rely on modeling human cognition. One state that can be quantified from various physiological signals is attention. Estimates of human attention can be used to reveal preferences and novel dimensions of user experience. Previous approaches have tackled these incredibly challenging tasks using a variety of behavioral signals, from dwell-time to click-through data, and computational models of visual correspondence to these behavioral signals. However, behavioral signals are only rough estimations of the real underlying attention and affective preferences of the users. Indeed, users may attend to some content simply because it is salient, but not because it is really interesting, or simply because it is outrageous. With this paper, we put forward a research agenda and example work using BCI to infer users' preferences, their attentional correlates towards visual content, and their associations with affective experience. Subsequently, we link these to relevant applications, such as information retrieval, personalized steering of generative models, and crowdsourcing population estimates of affective experiences.
Abstract:Modeling visual saliency in graphical user interfaces (GUIs) allows to understand how people perceive GUI designs and what elements attract their attention. One aspect that is often overlooked is the fact that computational models depend on a series of design parameters that are not straightforward to decide. We systematically analyze how different design parameters affect scanpath evaluation metrics using a state-of-the-art computational model (DeepGaze++). We particularly focus on three design parameters: input image size, inhibition-of-return decay, and masking radius. We show that even small variations of these design parameters have a noticeable impact on standard evaluation metrics such as DTW or Eyenalysis. These effects also occur in other scanpath models, such as UMSS and ScanGAN, and in other datasets such as MASSVIS. Taken together, our results put forward the impact of design decisions for predicting users' viewing behavior on GUIs.
Abstract:From a visual perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', so far there is no scanpath model capable of predicting scanpaths for an individual. To close this gap, we introduce EyeFormer, which leverages a Transformer architecture as a policy network to guide a deep reinforcement learning algorithm that controls gaze locations. Our model has the unique capability of producing personalized predictions when given a few user scanpath samples. It can predict full scanpath information, including fixation positions and duration, across individuals and various stimulus types. Additionally, we demonstrate applications in GUI layout optimization driven by our model. Our software and models will be publicly available.
Abstract:Staying in the intensive care unit (ICU) is often traumatic, leading to post-intensive care syndrome (PICS), which encompasses physical, psychological, and cognitive impairments. Currently, there are limited interventions available for PICS. Studies indicate that exposure to visual art may help address the psychological aspects of PICS and be more effective if it is personalized. We develop Machine Learning-based Visual Art Recommendation Systems (VA RecSys) to enable personalized therapeutic visual art experiences for post-ICU patients. We investigate four state-of-the-art VA RecSys engines, evaluating the relevance of their recommendations for therapeutic purposes compared to expert-curated recommendations. We conduct an expert pilot test and a large-scale user study (n=150) to assess the appropriateness and effectiveness of these recommendations. Our results suggest all recommendations enhance temporal affective states. Visual and multimodal VA RecSys engines compare favourably with expert-curated recommendations, indicating their potential to support the delivery of personalized art therapy for PICS prevention and treatment.
Abstract:Consciousness has been historically a heavily debated topic in engineering, science, and philosophy. On the contrary, awareness had less success in raising the interest of scholars in the past. However, things are changing as more and more researchers are getting interested in answering questions concerning what awareness is and how it can be artificially generated. The landscape is rapidly evolving, with multiple voices and interpretations of the concept being conceived and techniques being developed. The goal of this paper is to summarize and discuss the ones among these voices connected with projects funded by the EIC Pathfinder Challenge called ``Awareness Inside'', a nonrecurring call for proposals within Horizon Europe designed specifically for fostering research on natural and synthetic awareness. In this perspective, we dedicate special attention to challenges and promises of applying synthetic awareness in robotics, as the development of mature techniques in this new field is expected to have a special impact on generating more capable and trustworthy embodied systems.
Abstract:Artwork recommendation is challenging because it requires understanding how users interact with highly subjective content, the complexity of the concepts embedded within the artwork, and the emotional and cognitive reflections they may trigger in users. In this paper, we focus on efficiently capturing the elements (i.e., latent semantic relationships) of visual art for personalized recommendation. We propose and study recommender systems based on textual and visual feature learning techniques, as well as their combinations. We then perform a small-scale and a large-scale user-centric evaluation of the quality of the recommendations. Our results indicate that textual features compare favourably with visual ones, whereas a fusion of both captures the most suitable hidden semantic relationships for artwork recommendation. Ultimately, this paper contributes to our understanding of how to deliver content that suitably matches the user's interests and how they are perceived.
Abstract:Adapting an interface requires taking into account both the positive and negative effects that changes may have on the user. A carelessly picked adaptation may impose high costs to the user -- for example, due to surprise or relearning effort -- or "trap" the process to a suboptimal design immaturely. However, effects on users are hard to predict as they depend on factors that are latent and evolve over the course of interaction. We propose a novel approach for adaptive user interfaces that yields a conservative adaptation policy: It finds beneficial changes when there are such and avoids changes when there are none. Our model-based reinforcement learning method plans sequences of adaptations and consults predictive HCI models to estimate their effects. We present empirical and simulation results from the case of adaptive menus, showing that the method outperforms both a non-adaptive and a frequency-based policy.