Abstract:Effective long-term strategies enable AI systems to navigate complex environments by making sequential decisions over extended horizons. Similarly, reinforcement learning (RL) agents optimize decisions across sequences to maximize rewards, even without immediate feedback. To verify that Latent Diffusion-Constrained Q-learning (LDCQ), a prominent diffusion-based offline RL method, demonstrates strong reasoning abilities in multi-step decision-making, we aimed to evaluate its performance on the Abstraction and Reasoning Corpus (ARC). However, applying offline RL methodologies to enhance strategic reasoning in AI for solving tasks in ARC is challenging due to the lack of sufficient experience data in the ARC training set. To address this limitation, we introduce an augmented offline RL dataset for ARC, called Synthesized Offline Learning Data for Abstraction and Reasoning (SOLAR), along with the SOLAR-Generator, which generates diverse trajectory data based on predefined rules. SOLAR enables the application of offline RL methods by offering sufficient experience data. We synthesized SOLAR for a simple task and used it to train an agent with the LDCQ method. Our experiments demonstrate the effectiveness of the offline RL approach on a simple ARC task, showing the agent's ability to make multi-step sequential decisions and correctly identify answer states. These results highlight the potential of the offline RL approach to enhance AI's strategic reasoning capabilities.
Abstract:We propose a novel offline reinforcement learning (offline RL) approach, introducing the Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation (DIAR) framework. We address two key challenges in offline RL: out-of-distribution samples and long-horizon problems. We leverage diffusion models to learn state-action sequence distributions and incorporate value functions for more balanced and adaptive decision-making. DIAR introduces an Adaptive Revaluation mechanism that dynamically adjusts decision lengths by comparing current and future state values, enabling flexible long-term decision-making. Furthermore, we address Q-value overestimation by combining Q-network learning with a value function guided by a diffusion model. The diffusion model generates diverse latent trajectories, enhancing policy robustness and generalization. As demonstrated in tasks like Maze2D, AntMaze, and Kitchen, DIAR consistently outperforms state-of-the-art algorithms in long-horizon, sparse-reward environments.
Abstract:While significant progress has been made in task-specific applications, current models struggle with deep reasoning, generality, and adaptation -- key components of System-2 reasoning that are crucial for achieving Artificial General Intelligence (AGI). Despite the promise of approaches such as program synthesis, language models, and transformers, these methods often fail to generalize beyond their training data and to adapt to novel tasks, limiting their ability to perform human-like reasoning. This paper explores the limitations of existing approaches in achieving advanced System-2 reasoning and highlights the importance of generality and adaptation for AGI. Moreover, we propose four key research directions to address these gaps: (1) learning human intentions from action sequences, (2) combining symbolic and neural models, (3) meta-learning for unfamiliar environments, and (4) reinforcement learning to reason multi-step. Through these directions, we aim to advance the ability to generalize and adapt, bringing computational models closer to the reasoning capabilities required for AGI.
Abstract:We address an inverse problem in modeling holographic superconductors. We focus our research on the critical temperature behavior depicted by experiments. We use a physics-informed neural network method to find a mass function $M(F^2)$, which is necessary to understand phase transition behavior. This mass function describes a nonlinear interaction between superconducting order and charge carrier density. We introduce positional embedding layers to improve the learning process in our algorithm, and the Adam optimization is used to predict the critical temperature data via holographic calculation with appropriate accuracy. Consideration of the positional embedding layers is motivated by the transformer model of natural-language processing in the artificial intelligence (AI) field. We obtain holographic models that reproduce borderlines of the normal and superconducting phases provided by actual data. Our work is the first holographic attempt to match phase transition data quantitatively obtained from experiments. Also, the present work offers a new methodology for data-based holographic models.
Abstract:The effectiveness of AI model training hinges on the quality of the trajectory data used, particularly in aligning the model's decision with human intentions. However, in the human task-solving trajectories, we observe significant misalignments between human intentions and the recorded trajectories, which can undermine AI model training. This paper addresses the challenges of these misalignments by proposing a visualization tool and a heuristic algorithm designed to detect and categorize discrepancies in trajectory data. Although the heuristic algorithm requires a set of predefined human intentions to function, which we currently cannot extract, the visualization tool offers valuable insights into the nature of these misalignments. We expect that eliminating these misalignments could significantly improve the utility of trajectory data for AI model training. We also propose that future work should focus on developing methods, such as Topic Modeling, to accurately extract human intentions from trajectory data, thereby enhancing the alignment between user actions and AI learning processes.
Abstract:This paper demonstrates that model-based reinforcement learning (model-based RL) is a suitable approach for the task of analogical reasoning. We hypothesize that model-based RL can solve analogical reasoning tasks more efficiently through the creation of internal models. To test this, we compared DreamerV3, a model-based RL method, with Proximal Policy Optimization, a model-free RL method, on the Abstraction and Reasoning Corpus (ARC) tasks. Our results indicate that model-based RL not only outperforms model-free RL in learning and generalizing from single tasks but also shows significant advantages in reasoning across similar tasks.
Abstract:This paper introduces ARCLE, an environment designed to facilitate reinforcement learning research on the Abstraction and Reasoning Corpus (ARC). Addressing this inductive reasoning benchmark with reinforcement learning presents these challenges: a vast action space, a hard-to-reach goal, and a variety of tasks. We demonstrate that an agent with proximal policy optimization can learn individual tasks through ARCLE. The adoption of non-factorial policies and auxiliary losses led to performance enhancements, effectively mitigating issues associated with action spaces and goal attainment. Based on these insights, we propose several research directions and motivations for using ARCLE, including MAML, GFlowNets, and World Models.
Abstract:The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been results-centric, making it difficult to assess the inference process. We introduce a new approach using the Abstract and Reasoning Corpus (ARC) dataset to evaluate the inference and contextual understanding abilities of large language models in a process-centric manner. ARC demands rigorous logical structures for problem-solving, making it a benchmark that facilitates the comparison of model inference abilities with humans. Experimental results confirm that while large language models possess weak inference abilities, they still lag in terms of logical coherence, compositionality, and productivity. Our experiments highlight the reasoning capabilities of LLMs, proposing development paths for achieving human-level reasoning.
Abstract:In the pursuit of artificial general intelligence (AGI), we tackle Abstraction and Reasoning Corpus (ARC) tasks using a novel two-pronged approach. We employ the Decision Transformer in an imitation learning paradigm to model human problem-solving, and introduce an object detection algorithm, the Push and Pull clustering method. This dual strategy enhances AI's ARC problem-solving skills and provides insights for AGI progression. Yet, our work reveals the need for advanced data collection tools, robust training datasets, and refined model structures. This study highlights potential improvements for Decision Transformers and propels future AGI research.
Abstract:A mesh generation method that can generate an optimal mesh for a blade passage at a single attempt is developed using deep reinforcement learning (DRL). Unlike the conventional methods, where meshing parameters must be specified by the user or iteratively optimized from scratch for a newly given geometry, the developed method employs DRL-based multi-condition (MC) optimization to define meshing parameters for various geometries optimally. The method involves the following steps: (1) development of a base algorithm for structured mesh generation of a blade passage; (2) formulation of an MC optimization problem to optimize meshing parameters introduced while developing the base algorithm; and (3) development of a DRL-based mesh generation algorithm by solving the MC optimization problem using DRL. As a result, the developed algorithm is able to successfully generate optimal meshes at a single trial for various blades.