Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Linjie Xu

No Need to Train Your RDB Foundation Model

Feb 14, 2026

Linjie Xu, Yanlin Zhang, Quan Gan, Minjie Wang, David Wipf

Abstract:Relational databases (RDBs) contain vast amounts of heterogeneous tabular information that can be exploited for predictive modeling purposes. But since the space of potential targets is vast across enterprise settings, how can we \textit{avoid retraining} a new model each time we wish to predict a new quantity of interest? Foundation models based on in-context learning (ICL) offer a convenient option, but so far are largely restricted to single-table operability. In generalizing to multiple interrelated tables, it is essential to compress variably-sized RDB neighborhoods into fixed-length ICL samples for consumption by the decoder. However, the details here are critical: unlike existing supervised learning RDB pipelines, we provide theoretical and empirical evidence that ICL-specific compression should be constrained \emph{within} high-dimensional RDB columns where all entities share units and roles, not \textit{across} columns where the relevance of heterogeneous data types cannot possibly be determined without label information. Conditioned on this restriction, we then demonstrate that encoder expressiveness is actually not compromised by excluding trainable parameters. Hence we arrive at a principled family of RDB encoders that can be seamlessly paired with already-existing single-table ICL foundation models, whereby no training or fine-tuning is required. From a practical standpoint, we develop scalable SQL primitives to implement the encoder stage, resulting in an easy-to-use open-source RDB foundation model\footnote{\label{foot: RDBLearn_learn} https://github.com/HKUSHXLab/rdblearn} capable of robust performance on unseen datasets out of the box.

Via

Access Paper or Ask Questions

C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

Oct 03, 2024

Ruohong Liu, Yuxin Pan, Linjie Xu, Lei Song, Pengcheng You, Yize Chen, Jiang Bian

Figure 1 for C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

Figure 2 for C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

Figure 3 for C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

Figure 4 for C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

Abstract:Multi-objective reinforcement learning (MORL) excels at handling rapidly changing preferences in tasks that involve multiple criteria, even for unseen preferences. However, previous dominating MORL methods typically generate a fixed policy set or preference-conditioned policy through multiple training iterations exclusively for sampled preference vectors, and cannot ensure the efficient discovery of the Pareto front. Furthermore, integrating preferences into the input of policy or value functions presents scalability challenges, in particular as the dimension of the state and preference space grow, which can complicate the learning process and hinder the algorithm's performance on more complex tasks. To address these issues, we propose a two-stage Pareto front discovery algorithm called Constrained MORL (C-MORL), which serves as a seamless bridge between constrained policy optimization and MORL. Concretely, a set of policies is trained in parallel in the initialization stage, with each optimized towards its individual preference over the multiple objectives. Then, to fill the remaining vacancies in the Pareto front, the constrained optimization steps are employed to maximize one objective while constraining the other objectives to exceed a predefined threshold. Empirically, compared to recent advancements in MORL methods, our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks, especially with numerous objectives (up to nine objectives in our experiments).

* 27 pages, 8 figues. In Submission to a conference

Via

Access Paper or Ask Questions

Enhancing Cross-domain Pre-Trained Decision Transformers with Adaptive Attention

Sep 11, 2024

Wenhao Zhao, Qiushui Xu, Linjie Xu, Lei Song, Jinyu Wang, Chunlai Zhou, Jiang Bian

Figure 1 for Enhancing Cross-domain Pre-Trained Decision Transformers with Adaptive Attention

Figure 2 for Enhancing Cross-domain Pre-Trained Decision Transformers with Adaptive Attention

Figure 3 for Enhancing Cross-domain Pre-Trained Decision Transformers with Adaptive Attention

Figure 4 for Enhancing Cross-domain Pre-Trained Decision Transformers with Adaptive Attention

Abstract:Recently, the pre-training of decision transformers (DT) using a different domain, such as natural language text, has generated significant attention in offline reinforcement learning (Offline RL). Although this cross-domain pre-training approach achieves superior performance compared to training from scratch in environments required short-term planning ability, the mechanisms by which pre-training benefits the fine-tuning phase remain unclear. Furthermore, we point out that the cross-domain pre-training approach hinders the extraction of distant information in environments like PointMaze that require long-term planning ability, leading to performance that is much worse than training DT from scratch. This work first analyzes these issues and found that Markov Matrix, a component that exists in pre-trained attention heads, is the key to explain the significant performance disparity of pre-trained models in different planning abilities. Inspired by our analysis, we propose a general method GPT-DTMA, which equips a pre-trained DT with Mixture of Attention (MoA), to enable adaptive learning and accommodating diverse attention requirements during fine-tuning. Extensive experiments demonstrate that the effectiveness of GPT-DTMA: it achieves superior performance in short-term environments compared to baselines, and in long-term environments, it mitigates the negative impact caused by Markov Matrix, achieving results comparable to those of DT trained from scratch.

Via

Access Paper or Ask Questions

Strategy Game-Playing with Size-Constrained State Abstraction

Aug 12, 2024

Linjie Xu, Diego Perez-Liebana, Alexander Dockhorn

Abstract:Playing strategy games is a challenging problem for artificial intelligence (AI). One of the major challenges is the large search space due to a diverse set of game components. In recent works, state abstraction has been applied to search-based game AI and has brought significant performance improvements. State abstraction techniques rely on reducing the search space, e.g., by aggregating similar states. However, the application of these abstractions is hindered because the quality of an abstraction is difficult to evaluate. Previous works hence abandon the abstraction in the middle of the search to not bias the search to a local optimum. This mechanism introduces a hyper-parameter to decide the time to abandon the current state abstraction. In this work, we propose a size-constrained state abstraction (SCSA), an approach that limits the maximum number of nodes being grouped together. We found that with SCSA, the abstraction is not required to be abandoned. Our empirical results on $3$ strategy games show that the SCSA agent outperforms the previous methods and yields robust performance over different games. Codes are open-sourced at \url{https://github.com/GAIGResearch/Stratega}.

* 8 pages, to be published in Proceedings of the Conference on Games 2024, codes are open-sourced at https://github.com/GAIGResearch/Stratega

Via

Access Paper or Ask Questions

Protecting Your LLMs with Information Bottleneck

Apr 22, 2024

Zichuan Liu, Zefan Wang, Linjie Xu, Jinyu Wang, Lei Song, Tianchun Wang, Chunlin Chen, Wei Cheng, Jiang Bian

Figure 1 for Protecting Your LLMs with Information Bottleneck

Figure 2 for Protecting Your LLMs with Information Bottleneck

Figure 3 for Protecting Your LLMs with Information Bottleneck

Figure 4 for Protecting Your LLMs with Information Bottleneck

Abstract:The advent of large language models (LLMs) has revolutionized the field of natural language processing, yet they might be attacked to produce harmful content. Despite efforts to ethically align LLMs, these are often fragile and can be circumvented by jailbreaking attacks through optimized or manual adversarial prompts. To address this, we introduce the Information Bottleneck Protector (IBProtector), a defense mechanism grounded in the information bottleneck principle, and we modify the objective to avoid trivial solutions. The IBProtector selectively compresses and perturbs prompts, facilitated by a lightweight and trainable extractor, preserving only essential information for the target LLMs to respond with the expected answer. Moreover, we further consider a situation where the gradient is not visible to be compatible with any LLM. Our empirical evaluations show that IBProtector outperforms current defense methods in mitigating jailbreak attempts, without overly affecting response quality or inference speed. Its effectiveness and adaptability across various attack methods and target LLMs underscore the potential of IBProtector as a novel, transferable defense that bolsters the security of LLMs without requiring modifications to the underlying models.

Via

Access Paper or Ask Questions

Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Apr 15, 2024

Linjie Xu, Zichuan Liu, Alexander Dockhorn, Diego Perez-Liebana, Jinyu Wang, Lei Song, Jiang Bian

Figure 1 for Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Figure 2 for Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Figure 3 for Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Figure 4 for Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Abstract:One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate $3$ MARL methods on $6$ SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://anonymous.4open.science/r/rr_for_MARL-0D83/.

Via

Access Paper or Ask Questions

Mildly Constrained Evaluation Policy for Offline Reinforcement Learning

Jun 06, 2023

Linjie Xu, Zhengyao Jiang, Jinyu Wang, Lei Song, Jiang Bian

Figure 1 for Mildly Constrained Evaluation Policy for Offline Reinforcement Learning

Figure 2 for Mildly Constrained Evaluation Policy for Offline Reinforcement Learning

Figure 3 for Mildly Constrained Evaluation Policy for Offline Reinforcement Learning

Figure 4 for Mildly Constrained Evaluation Policy for Offline Reinforcement Learning

Abstract:Offline reinforcement learning (RL) methodologies enforce constraints on the policy to adhere closely to the behavior policy, thereby stabilizing value learning and mitigating the selection of out-of-distribution (OOD) actions during test time. Conventional approaches apply identical constraints for both value learning and test time inference. However, our findings indicate that the constraints suitable for value estimation may in fact be excessively restrictive for action selection during test time. To address this issue, we propose a Mildly Constrained Evaluation Policy (MCEP) for test time inference with a more constrained target policy for value estimation. Since the target policy has been adopted in various prior approaches, MCEP can be seamlessly integrated with them as a plug-in. We instantiate MCEP based on TD3-BC [Fujimoto and Gu, 2021] and AWAC [Nair et al., 2020] algorithms. The empirical results on MuJoCo locomotion tasks show that the MCEP significantly outperforms the target policy and achieves competitive results to state-of-the-art offline RL methods. The codes are open-sourced at https://github.com/egg-west/MCEP.git.

Via

Access Paper or Ask Questions

Elastic Monte Carlo Tree Search with State Abstraction for Strategy Game Playing

May 30, 2022

Linjie Xu, Jorge Hurtado-Grueso, Dominic Jeurissen, Diego Perez Liebana, Alexander Dockhorn

Figure 1 for Elastic Monte Carlo Tree Search with State Abstraction for Strategy Game Playing

Figure 2 for Elastic Monte Carlo Tree Search with State Abstraction for Strategy Game Playing

Figure 3 for Elastic Monte Carlo Tree Search with State Abstraction for Strategy Game Playing

Figure 4 for Elastic Monte Carlo Tree Search with State Abstraction for Strategy Game Playing

Abstract:Strategy video games challenge AI agents with their combinatorial search space caused by complex game elements. State abstraction is a popular technique that reduces the state space complexity. However, current state abstraction methods for games depend on domain knowledge, making their application to new games expensive. State abstraction methods that require no domain knowledge are studied extensively in the planning domain. However, no evidence shows they scale well with the complexity of strategy games. In this paper, we propose Elastic MCTS, an algorithm that uses state abstraction to play strategy games. In Elastic MCTS, the nodes of the tree are clustered dynamically, first grouped together progressively by state abstraction, and then separated when an iteration threshold is reached. The elastic changes benefit from efficient searching brought by state abstraction but avoid the negative influence of using state abstraction for the whole search. To evaluate our method, we make use of the general strategy games platform Stratega to generate scenarios of varying complexity. Results show that Elastic MCTS outperforms MCTS baselines with a large margin, while reducing the tree size by a factor of $10$. Code can be found at: https://github.com/egg-west/Stratega

* 8 pages, 3 figures; Published on IEEE Conference on Games 2022

Via

Access Paper or Ask Questions

Portfolio Search and Optimization for General Strategy Game-Playing

Apr 21, 2021

Alexander Dockhorn, Jorge Hurtado-Grueso, Dominik Jeurissen, Linjie Xu, Diego Perez-Liebana

Figure 1 for Portfolio Search and Optimization for General Strategy Game-Playing

Figure 2 for Portfolio Search and Optimization for General Strategy Game-Playing

Figure 3 for Portfolio Search and Optimization for General Strategy Game-Playing

Figure 4 for Portfolio Search and Optimization for General Strategy Game-Playing

Abstract:Portfolio methods represent a simple but efficient type of action abstraction which has shown to improve the performance of search-based agents in a range of strategy games. We first review existing portfolio techniques and propose a new algorithm for optimization and action-selection based on the Rolling Horizon Evolutionary Algorithm. Moreover, a series of variants are developed to solve problems in different aspects. We further analyze the performance of discussed agents in a general strategy game-playing task. For this purpose, we run experiments on three different game-modes of the Stratega framework. For the optimization of the agents' parameters and portfolio sets we study the use of the N-tuple Bandit Evolutionary Algorithm. The resulting portfolio sets suggest a high diversity in play-styles while being able to consistently beat the sample agents. An analysis of the agents' performance shows that the proposed algorithm generalizes well to all game-modes and is able to outperform other portfolio methods.

* 8 pages, 5 figures, submitted to CEC 2021

Via

Access Paper or Ask Questions

Generating Diverse and Competitive Play-Styles for Strategy Games

Apr 17, 2021

Diego Perez-Liebana, Cristina Guerrero-Romero, Alexander Dockhorn, Dominik Jeurissen, Linjie Xu

Figure 1 for Generating Diverse and Competitive Play-Styles for Strategy Games

Figure 2 for Generating Diverse and Competitive Play-Styles for Strategy Games

Figure 3 for Generating Diverse and Competitive Play-Styles for Strategy Games

Figure 4 for Generating Diverse and Competitive Play-Styles for Strategy Games

Abstract:Designing agents that are able to achieve different play-styles while maintaining a competitive level of play is a difficult task, especially for games for which the research community has not found super-human performance yet, like strategy games. These require the AI to deal with large action spaces, long-term planning and partial observability, among other well-known factors that make decision-making a hard problem. On top of this, achieving distinct play-styles using a general algorithm without reducing playing strength is not trivial. In this paper, we propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes) and show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play. Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.

* 8 pages, 2 figures, submitted to IEEE CoG 2021

Via

Access Paper or Ask Questions