Abstract:Multimodal information-gathering settings, where users collaborate with AI in dynamic environments, are increasingly common. These involve complex processes with textual and multimodal interactions, often requiring additional structural information via cost-incurring requests. AI helpers lack access to users' true goals, beliefs, and preferences and struggle to integrate diverse information effectively. We propose a social continual learning framework for causal knowledge acquisition and collaborative decision-making. It focuses on autonomous agents learning through dialogues, question-asking, and interaction in open, partially observable environments. A key component is a natural language oracle that answers the agent's queries about environmental mechanisms and states, refining causal understanding while balancing exploration or learning, and exploitation or knowledge use. Evaluation tasks inspired by developmental psychology emphasize causal reasoning and question-asking skills. They complement benchmarks by assessing the agent's ability to identify knowledge gaps, generate meaningful queries, and incrementally update reasoning. The framework also evaluates how knowledge acquisition costs are amortized across tasks within the same environment. We propose two architectures: 1) a system combining Large Language Models (LLMs) with the ReAct framework and question-generation, and 2) an advanced system with a causal world model, symbolic, graph-based, or subsymbolic, for reasoning and decision-making. The latter builds a causal knowledge graph for efficient inference and adaptability under constraints. Challenges include integrating causal reasoning into ReAct and optimizing exploration and question-asking in error-prone scenarios. Beyond applications, this framework models developmental processes combining causal reasoning, question generation, and social learning.
Abstract:The purpose of this work is to investigate the soundness and utility of a neural network-based approach as a framework for exploring the impact of image enhancement techniques on visual cortex activation. In a preliminary study, we prepare a set of state-of-the-art brain encoding models, selected among the top 10 methods that participated in The Algonauts Project 2023 Challenge [16]. We analyze their ability to make valid predictions about the effects of various image enhancement techniques on neural responses. Given the impossibility of acquiring the actual data due to the high costs associated with brain imaging procedures, our investigation builds up on a series of experiments. Specifically, we analyze the ability of brain encoders to estimate the cerebral reaction to various augmentations by evaluating the response to augmentations targeting objects (i.e., faces and words) with known impact on specific areas. Moreover, we study the predicted activation in response to objects unseen during training, exploring the impact of semantically out-of-distribution stimuli. We provide relevant evidence for the generalization ability of the models forming the proposed framework, which appears to be promising for the identification of the optimal visual augmentation filter for a given task, model-driven design strategies as well as for AR and VR applications.