Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenxin Li

Mxplainer: Explain and Learn Insights by Imitating Mahjong Agents

Jun 17, 2025

Lingfeng Li, Yunlong Lu, Yongyi Wang, Qifan Zheng, Wenxin Li

Abstract:People need to internalize the skills of AI agents to improve their own capabilities. Our paper focuses on Mahjong, a multiplayer game involving imperfect information and requiring effective long-term decision-making amidst randomness and hidden information. Through the efforts of AI researchers, several impressive Mahjong AI agents have already achieved performance levels comparable to those of professional human players; however, these agents are often treated as black boxes from which few insights can be gleaned. This paper introduces Mxplainer, a parameterized search algorithm that can be converted into an equivalent neural network to learn the parameters of black-box agents. Experiments conducted on AI and human player data demonstrate that the learned parameters provide human-understandable insights into these agents' characteristics and play styles. In addition to analyzing the learned parameters, we also showcase how our search-based framework can locally explain the decision-making processes of black-box agents for most Mahjong game states.

Via

Access Paper or Ask Questions

InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback

May 29, 2025

Boyuan Chen, Donghai Hong, Jiaming Ji, Jiacheng Zheng, Bowen Dong, Jiayi Zhou, Kaile Wang, Juntao Dai, Xuyao Wang, Wenqi Chen(+5 more)

Abstract:As multimodal large models (MLLMs) continue to advance across challenging tasks, a key question emerges: What essential capabilities are still missing? A critical aspect of human learning is continuous interaction with the environment -- not limited to language, but also involving multimodal understanding and generation. To move closer to human-level intelligence, models must similarly support multi-turn, multimodal interaction. In particular, they should comprehend interleaved multimodal contexts and respond coherently in ongoing exchanges. In this work, we present an initial exploration through the InterMT -- the first preference dataset for multi-turn multimodal interaction, grounded in real human feedback. In this exploration, we particularly emphasize the importance of human oversight, introducing expert annotations to guide the process, motivated by the fact that current MLLMs lack such complex interactive capabilities. InterMT captures human preferences at both global and local levels into nine sub-dimensions, consists of 15.6k prompts, 52.6k multi-turn dialogue instances, and 32.4k human-labeled preference pairs. To compensate for the lack of capability for multi-modal understanding and generation, we introduce an agentic workflow that leverages tool-augmented MLLMs to construct multi-turn QA instances. To further this goal, we introduce InterMT-Bench to assess the ability of MLLMs in assisting judges with multi-turn, multimodal tasks. We demonstrate the utility of \InterMT through applications such as judge moderation and further reveal the multi-turn scaling law of judge model. We hope the open-source of our data can help facilitate further research on aligning current MLLMs to the next step. Our project website can be found at https://pku-intermt.github.io .

Via

Access Paper or Ask Questions

Business Process Text Sketch Automation Generation Using Large Language Model

Sep 03, 2023

Rui Zhu, Quanzhou Hu, Wenxin Li, Honghao Xiao, Chaogang Wang, Zixin Zhou

Abstract:Business Process Management (BPM) is gaining increasing attention as it has the potential to cut costs while boosting output and quality. Business process document generation is a crucial stage in BPM. However, due to a shortage of datasets, data-driven deep learning techniques struggle to deliver the expected results. We propose an approach to transform Conditional Process Trees (CPTs) into Business Process Text Sketches (BPTSs) using Large Language Models (LLMs). The traditional prompting approach (Few-shot In-Context Learning) tries to get the correct answer in one go, and it can find the pattern of transforming simple CPTs into BPTSs, but for close-domain and CPTs with complex hierarchy, the traditional prompts perform weakly and with low correctness. We suggest using this technique to break down a difficult CPT into a number of basic CPTs and then solve each one in turn, drawing inspiration from the divide-and-conquer strategy. We chose 100 process trees with depths ranging from 2 to 5 at random, as well as CPTs with many nodes, many degrees of selection, and cyclic nesting. Experiments show that our method can achieve a correct rate of 93.42%, which is 45.17% better than traditional prompting methods. Our proposed method provides a solution for business process document generation in the absence of datasets, and secondly, it becomes potentially possible to provide a large number of datasets for the process model extraction (PME) domain.

* 10 pages, 7 figures

Via

Access Paper or Ask Questions

Evolutionary Game-Theoretical Analysis for General Multiplayer Asymmetric Games

Jun 22, 2022

Xinyu Zhang, Peng Peng, Yushan Zhou, Haifeng Wang, Wenxin Li

Figure 1 for Evolutionary Game-Theoretical Analysis for General Multiplayer Asymmetric Games

Figure 2 for Evolutionary Game-Theoretical Analysis for General Multiplayer Asymmetric Games

Figure 3 for Evolutionary Game-Theoretical Analysis for General Multiplayer Asymmetric Games

Figure 4 for Evolutionary Game-Theoretical Analysis for General Multiplayer Asymmetric Games

Abstract:Evolutionary game theory has been a successful tool to combine classical game theory with learning-dynamical descriptions in multiagent systems. Provided some symmetric structures of interacting players, many studies have been focused on using a simplified heuristic payoff table as input to analyse the dynamics of interactions. Nevertheless, even for the state-of-the-art method, there are two limits. First, there is inaccuracy when analysing the simplified payoff table. Second, no existing work is able to deal with 2-population multiplayer asymmetric games. In this paper, we fill the gap between heuristic payoff table and dynamic analysis without any inaccuracy. In addition, we propose a general framework for $m$ versus $n$ 2-population multiplayer asymmetric games. Then, we compare our method with the state-of-the-art in some classic games. Finally, to illustrate our method, we perform empirical game-theoretical analysis on Wolfpack as well as StarCraft II, both of which involve complex multiagent interactions.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

A Note on Optimizing the Ratio of Monotone Supermodular Functions

Dec 16, 2020

Wenxin Li

Abstract:We show that for the problem of minimizing (or maximizing) the ratio of two supermodular functions, no bounded approximation ratio can be achieved via polynomial number of queries, if the two supermodular functions are both monotone non-decreasing or non-increasing.

Via

Access Paper or Ask Questions

A Note on Monotone Submodular Maximization with Cardinality Constraint

Jun 17, 2020

Wenxin Li

Figure 1 for A Note on Monotone Submodular Maximization with Cardinality Constraint

Abstract:We show that for the cardinality constrained monotone submodular maximization problem, there exists a $(1-1/e-\varepsilon)$-approximate deterministic algorithm with linear query complexity, which performs $O(n/\varepsilon)$ queries in total.

Via

Access Paper or Ask Questions

Artificial Intelligence for Prosthetics - challenge solutions

Feb 07, 2019

Łukasz Kidziński, Carmichael Ong, Sharada Prasanna Mohanty, Jennifer Hicks, Sean F. Carroll, Bo Zhou, Hongsheng Zeng, Fan Wang, Rongzhong Lian, Hao Tian(+40 more)

Figure 1 for Artificial Intelligence for Prosthetics - challenge solutions

Figure 2 for Artificial Intelligence for Prosthetics - challenge solutions

Figure 3 for Artificial Intelligence for Prosthetics - challenge solutions

Figure 4 for Artificial Intelligence for Prosthetics - challenge solutions

Abstract:In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants were invited to describe their algorithms. In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each team implemented different modifications of the known algorithms by, for example, dividing the task into subtasks, learning low-level control, or by incorporating expert knowledge and using imitation learning.

Via

Access Paper or Ask Questions

Layout Design for Intelligent Warehouse by Evolution with Fitness Approximation

Nov 14, 2018

Haifeng Zhang, Zilong Guo, Han Cai, Chris Wang, Weinan Zhang, Yong Yu, Wenxin Li, Jun Wang

Figure 1 for Layout Design for Intelligent Warehouse by Evolution with Fitness Approximation

Figure 2 for Layout Design for Intelligent Warehouse by Evolution with Fitness Approximation

Figure 3 for Layout Design for Intelligent Warehouse by Evolution with Fitness Approximation

Figure 4 for Layout Design for Intelligent Warehouse by Evolution with Fitness Approximation

Abstract:With the rapid growth of the express industry, intelligent warehouses that employ autonomous robots for carrying parcels have been widely used to handle the vast express volume. For such warehouses, the warehouse layout design plays a key role in improving the transportation efficiency. However, this work is still done by human experts, which is expensive and leads to suboptimal results. In this paper, we aim to automate the warehouse layout designing process. We propose a two-layer evolutionary algorithm to efficiently explore the warehouse layout space, where an auxiliary objective fitness approximation model is introduced to predict the outcome of the designed warehouse layout and a two-layer population structure is proposed to incorporate the approximation model into the ordinary evolution framework. Empirical experiments show that our method can efficiently design effective warehouse layouts that outperform both heuristic-designed and vanilla evolution-designed warehouse layouts.

Via

Access Paper or Ask Questions

Learning to Design Games: Strategic Environments in Reinforcement Learning

May 23, 2018

Haifeng Zhang, Jun Wang, Zhiming Zhou, Weinan Zhang, Ying Wen, Yong Yu, Wenxin Li

Figure 1 for Learning to Design Games: Strategic Environments in Reinforcement Learning

Figure 2 for Learning to Design Games: Strategic Environments in Reinforcement Learning

Figure 3 for Learning to Design Games: Strategic Environments in Reinforcement Learning

Figure 4 for Learning to Design Games: Strategic Environments in Reinforcement Learning

Abstract:In typical reinforcement learning (RL), the environment is assumed given and the goal of the learning is to identify an optimal policy for the agent taking actions through its interactions with the environment. In this paper, we extend this setting by considering the environment is not given, but controllable and learnable through its interaction with the agent at the same time. This extension is motivated by environment design scenarios in the real-world, including game design, shopping space design and traffic signal design. Theoretically, we find a dual Markov decision process (MDP) w.r.t. the environment to that w.r.t. the agent, and derive a policy gradient solution to optimizing the parametrized environment. Furthermore, discontinuous environments are addressed by a proposed general generative framework. Our experiments on a Maze game design task show the effectiveness of the proposed algorithms in generating diverse and challenging Mazes against various agent settings.

Via

Access Paper or Ask Questions

N2RPP: An Adversarial Network to Rebuild Plantar Pressure for ACLD Patients

May 08, 2018

Yi Zhang, Zhengfei Wang, Guoxiong Xu, Hongshi Huang, Wenxin Li

Figure 1 for N2RPP: An Adversarial Network to Rebuild Plantar Pressure for ACLD Patients

Figure 2 for N2RPP: An Adversarial Network to Rebuild Plantar Pressure for ACLD Patients

Figure 3 for N2RPP: An Adversarial Network to Rebuild Plantar Pressure for ACLD Patients

Figure 4 for N2RPP: An Adversarial Network to Rebuild Plantar Pressure for ACLD Patients

Abstract:Foot is a vital part of human, and lots of valuable information is embedded. Plantar pressure is one of which contains this information and it describes human walking features. It is proved that once one has trouble with lower limb, the distribution of plantar pressure will change to some degree. Plantar pressure can be converted into images according to some simple standards. In this paper, we take full advantage of these plantar pressure images for medical usage. We present N2RPP, a generative adversarial network (GAN) based method to rebuild plantar pressure images of anterior cruciate ligament deficiency (ACLD) patients from low dimension features, which are extracted from an autoencoder. Through the result of experiments, the extracted features are a useful representation to describe and rebuild plantar pressure images. According to N2RPP's results, we find out that there are several noteworthy differences between normal people and patients. This can provide doctors a rough direction of adjusting plantar pressure to a better distribution to reduce patients' sore and pain during the rehabilitation treatment for ACLD.

Via

Access Paper or Ask Questions