Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Antti Oulasvirta

Department of Communications and Networking, Aalto University, Finland

Controllable GUI Exploration

Feb 05, 2025

Aryan Garg, Yue Jiang, Antti Oulasvirta

Figure 1 for Controllable GUI Exploration

Figure 2 for Controllable GUI Exploration

Figure 3 for Controllable GUI Exploration

Figure 4 for Controllable GUI Exploration

Abstract:During the early stages of interface design, designers need to produce multiple sketches to explore a design space. Design tools often fail to support this critical stage, because they insist on specifying more details than necessary. Although recent advances in generative AI have raised hopes of solving this issue, in practice they fail because expressing loose ideas in a prompt is impractical. In this paper, we propose a diffusion-based approach to the low-effort generation of interface sketches. It breaks new ground by allowing flexible control of the generation process via three types of inputs: A) prompts, B) wireframes, and C) visual flows. The designer can provide any combination of these as input at any level of detail, and will get a diverse gallery of low-fidelity solutions in response. The unique benefit is that large design spaces can be explored rapidly with very little effort in input-specification. We present qualitative results for various combinations of input specifications. Additionally, we demonstrate that our model aligns more accurately with these specifications than other models.

Via

Access Paper or Ask Questions

AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design

Oct 25, 2024

Francisco Erivaldo Fernandes Junior, Antti Oulasvirta

Figure 1 for AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design

Figure 2 for AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design

Figure 3 for AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design

Figure 4 for AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design

Abstract:Developing a reinforcement learning (RL) agent often involves identifying effective values for a large number of parameters, covering the policy, reward function, environment, and the agent's internal architecture, such as parameters controlling how the peripheral vision and memory modules work. Critically, since these parameters are interrelated in complex ways, optimizing them can be viewed as a black box optimization problem, which is especially challenging for non-experts. Although existing optimization-as-a-service platforms (e.g., Vizier, Optuna) can handle such problems, they are impractical for RL systems, as users must manually map each parameter to different components, making the process cumbersome and error-prone. They also require deep understanding of the optimization process, limiting their application outside ML experts and restricting access for fields like cognitive science, which models human decision-making. To tackle these challenges, we present AgentForge, a flexible low-code framework to optimize any parameter set across an RL system. AgentForge allows the user to perform individual or joint optimization of parameter sets. An optimization problem can be defined in a few lines of code and handed to any of the interfaced optimizers. We evaluated its performance in a challenging vision-based RL problem. AgentForge enables practitioners to develop RL agents without requiring extensive coding or deep expertise in optimization.

* This preprint was submitted to the 17th International Conference on Agents and Artificial Intelligence (ICAART'2025) and is currently under review

Via

Access Paper or Ask Questions

Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces

Apr 21, 2024

Yue Jiang, Changkong Zhou, Vikas Garg, Antti Oulasvirta

Figure 1 for Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces

Figure 2 for Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces

Figure 3 for Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces

Figure 4 for Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces

Abstract:Present-day graphical user interfaces (GUIs) exhibit diverse arrangements of text, graphics, and interactive elements such as buttons and menus, but representations of GUIs have not kept up. They do not encapsulate both semantic and visuo-spatial relationships among elements. To seize machine learning's potential for GUIs more efficiently, Graph4GUI exploits graph neural networks to capture individual elements' properties and their semantic-visuo-spatial constraints in a layout. The learned representation demonstrated its effectiveness in multiple tasks, especially generating designs in a challenging GUI autocompletion task, which involved predicting the positions of remaining unplaced elements in a partially completed GUI. The new model's suggestions showed alignment and visual appeal superior to the baseline method and received higher subjective ratings for preference. Furthermore, we demonstrate the practical benefits and efficiency advantages designers perceive when utilizing our model as an autocompletion plug-in.

* 18 pages

Via

Access Paper or Ask Questions

EyeFormer: Predicting Personalized Scanpaths with Transformer-Guided Reinforcement Learning

Apr 15, 2024

Yue Jiang, Zixin Guo, Hamed Rezazadegan Tavakoli, Luis A. Leiva, Antti Oulasvirta

Abstract:From a visual perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', so far there is no scanpath model capable of predicting scanpaths for an individual. To close this gap, we introduce EyeFormer, which leverages a Transformer architecture as a policy network to guide a deep reinforcement learning algorithm that controls gaze locations. Our model has the unique capability of producing personalized predictions when given a few user scanpath samples. It can predict full scanpath information, including fixation positions and duration, across individuals and various stimulus types. Additionally, we demonstrate applications in GUI layout optimization driven by our model. Our software and models will be publicly available.

Via

Access Paper or Ask Questions

Pedestrian crossing decisions can be explained by bounded optimal decision-making under noisy visual perception

Feb 06, 2024

Yueyang Wang, Aravinda Ramakrishnan Srinivasan, Jussi P. P. Jokinen, Antti Oulasvirta, Gustav Markkula

Abstract:This paper presents a model of pedestrian crossing decisions, based on the theory of computational rationality. It is assumed that crossing decisions are boundedly optimal, with bounds on optimality arising from human cognitive limitations. While previous models of pedestrian behaviour have been either 'black-box' machine learning models or mechanistic models with explicit assumptions about cognitive factors, we combine both approaches. Specifically, we model mechanistically noisy human visual perception and assumed rewards in crossing, but we use reinforcement learning to learn bounded optimal behaviour policy. The model reproduces a larger number of known empirical phenomena than previous models, in particular: (1) the effect of the time to arrival of an approaching vehicle on whether the pedestrian accepts the gap, the effect of the vehicle's speed on both (2) gap acceptance and (3) pedestrian timing of crossing in front of yielding vehicles, and (4) the effect on this crossing timing of the stopping distance of the yielding vehicle. Notably, our findings suggest that behaviours previously framed as 'biases' in decision-making, such as speed-dependent gap acceptance, might instead be a product of rational adaptation to the constraints of visual perception. Our approach also permits fitting the parameters of cognitive constraints and rewards per individual, to better account for individual differences. To conclude, by leveraging both RL and mechanistic modelling, our model offers novel insights about pedestrian behaviour, and may provide a useful foundation for more accurate and scalable pedestrian models.

Via

Access Paper or Ask Questions

Modeling human road crossing decisions as reward maximization with visual perception limitations

Jan 27, 2023

Yueyang Wang, Aravinda Ramakrishnan Srinivasan, Jussi P. P. Jokinen, Antti Oulasvirta, Gustav Markkula

Figure 1 for Modeling human road crossing decisions as reward maximization with visual perception limitations

Figure 2 for Modeling human road crossing decisions as reward maximization with visual perception limitations

Figure 3 for Modeling human road crossing decisions as reward maximization with visual perception limitations

Figure 4 for Modeling human road crossing decisions as reward maximization with visual perception limitations

Abstract:Understanding the interaction between different road users is critical for road safety and automated vehicles (AVs). Existing mathematical models on this topic have been proposed based mostly on either cognitive or machine learning (ML) approaches. However, current cognitive models are incapable of simulating road user trajectories in general scenarios, and ML models lack a focus on the mechanisms generating the behavior and take a high-level perspective which can cause failures to capture important human-like behaviors. Here, we develop a model of human pedestrian crossing decisions based on computational rationality, an approach using deep reinforcement learning (RL) to learn boundedly optimal behavior policies given human constraints, in our case a model of the limited human visual system. We show that the proposed combined cognitive-RL model captures human-like patterns of gap acceptance and crossing initiation time. Interestingly, our model's decisions are sensitive to not only the time gap, but also the speed of the approaching vehicle, something which has been described as a "bias" in human gap acceptance behavior. However, our results suggest that this is instead a rational adaption to human perceptual limitations. Moreover, we demonstrate an approach to accounting for individual differences in computational rationality models, by conditioning the RL policy on the parameters of the human constraints. Our results demonstrate the feasibility of generating more human-like road user behavior by combining RL with cognitive models.

* 6 pages, 5 figures,1 table, manuscript created for consideration at IEEE IV 2023 conference

Via

Access Paper or Ask Questions

Investigating Positive and Negative Qualities of Human-in-the-Loop Optimization for Designing Interaction Techniques

Apr 15, 2022

Liwei Chan, Yi-Chi Liao, George B. Mo, John J. Dudley, Chun-Lien Cheng, Per Ola Kristensson, Antti Oulasvirta

Figure 1 for Investigating Positive and Negative Qualities of Human-in-the-Loop Optimization for Designing Interaction Techniques

Figure 2 for Investigating Positive and Negative Qualities of Human-in-the-Loop Optimization for Designing Interaction Techniques

Figure 3 for Investigating Positive and Negative Qualities of Human-in-the-Loop Optimization for Designing Interaction Techniques

Figure 4 for Investigating Positive and Negative Qualities of Human-in-the-Loop Optimization for Designing Interaction Techniques

Abstract:Designers reportedly struggle with design optimization tasks where they are asked to find a combination of design parameters that maximizes a given set of objectives. In HCI, design optimization problems are often exceedingly complex, involving multiple objectives and expensive empirical evaluations. Model-based computational design algorithms assist designers by generating design examples during design, however they assume a model of the interaction domain. Black box methods for assistance, on the other hand, can work with any design problem. However, virtually all empirical studies of this human-in-the-loop approach have been carried out by either researchers or end-users. The question stands out if such methods can help designers in realistic tasks. In this paper, we study Bayesian optimization as an algorithmic method to guide the design optimization process. It operates by proposing to a designer which design candidate to try next, given previous observations. We report observations from a comparative study with 40 novice designers who were tasked to optimize a complex 3D touch interaction technique. The optimizer helped designers explore larger proportions of the design space and arrive at a better solution, however they reported lower agency and expressiveness. Designers guided by an optimizer reported lower mental effort but also felt less creative and less in charge of the progress. We conclude that human-in-the-loop optimization can support novice designers in cases where agency is not critical.

* CHI 2022

Via

Access Paper or Ask Questions

Rediscovering Affordance: A Reinforcement Learning Perspective

Jan 07, 2022

Yi-Chi Liao, Kashyap Todi, Aditya Acharya, Antti Keurulainen, Andrew Howes, Antti Oulasvirta

Figure 1 for Rediscovering Affordance: A Reinforcement Learning Perspective

Figure 2 for Rediscovering Affordance: A Reinforcement Learning Perspective

Figure 3 for Rediscovering Affordance: A Reinforcement Learning Perspective

Figure 4 for Rediscovering Affordance: A Reinforcement Learning Perspective

Abstract:Affordance refers to the perception of possible actions allowed by an object. Despite its relevance to human-computer interaction, no existing theory explains the mechanisms that underpin affordance-formation; that is, how affordances are discovered and adapted via interaction. We propose an integrative theory of affordance-formation based on the theory of reinforcement learning in cognitive sciences. The key assumption is that users learn to associate promising motor actions to percepts via experience when reinforcement signals (success/failure) are present. They also learn to categorize actions (e.g., "rotating" a dial), giving them the ability to name and reason about affordance. Upon encountering novel widgets, their ability to generalize these actions determines their ability to perceive affordances. We implement this theory in a virtual robot model, which demonstrates human-like adaptation of affordance in interactive widgets tasks. While its predictions align with trends in human data, humans are able to adapt affordances faster, suggesting the existence of additional mechanisms.

* 15 pages, In proceedings of the ACM CHI 2022

Via

Access Paper or Ask Questions

Toward AI Assistants That Let Designers Design

Jul 22, 2021

Sebastiaan De Peuter, Antti Oulasvirta, Samuel Kaski

Figure 1 for Toward AI Assistants That Let Designers Design

Figure 2 for Toward AI Assistants That Let Designers Design

Figure 3 for Toward AI Assistants That Let Designers Design

Abstract:AI for supporting designers needs to be rethought. It should aim to cooperate, not automate, by supporting and leveraging the creativity and problem-solving of designers. The challenge for such AI is how to infer designers' goals and then help them without being needlessly disruptive. We present AI-assisted design: a framework for creating such AI, built around generative user models which enable reasoning about designers' goals, reasoning, and capabilities.

* 9 pages, 3 figures, submitted to IEEE Computer magazine

Via

Access Paper or Ask Questions

Adapting User Interfaces with Model-based Reinforcement Learning

Mar 11, 2021

Kashyap Todi, Gilles Bailly, Luis A. Leiva, Antti Oulasvirta

Figure 1 for Adapting User Interfaces with Model-based Reinforcement Learning

Figure 2 for Adapting User Interfaces with Model-based Reinforcement Learning

Figure 3 for Adapting User Interfaces with Model-based Reinforcement Learning

Figure 4 for Adapting User Interfaces with Model-based Reinforcement Learning

Abstract:Adapting an interface requires taking into account both the positive and negative effects that changes may have on the user. A carelessly picked adaptation may impose high costs to the user -- for example, due to surprise or relearning effort -- or "trap" the process to a suboptimal design immaturely. However, effects on users are hard to predict as they depend on factors that are latent and evolve over the course of interaction. We propose a novel approach for adaptive user interfaces that yields a conservative adaptation policy: It finds beneficial changes when there are such and avoids changes when there are none. Our model-based reinforcement learning method plans sequences of adaptations and consults predictive HCI models to estimate their effects. We present empirical and simulation results from the case of adaptive menus, showing that the method outperforms both a non-adaptive and a frequency-based policy.

* 13 pages, 10 figures, ACM CHI 2021 Full Paper

Via

Access Paper or Ask Questions