Abstract:Trustworthiness reasoning is crucial in multiplayer games with incomplete information, enabling agents to identify potential allies and adversaries, thereby enhancing reasoning and decision-making processes. Traditional approaches relying on pre-trained models necessitate extensive domain-specific data and considerable reward feedback, with their lack of real-time adaptability hindering their effectiveness in dynamic environments. In this paper, we introduce the Graph Retrieval Augmented Reasoning (GRATR) framework, leveraging the Retrieval-Augmented Generation (RAG) technique to bolster trustworthiness reasoning in agents. GRATR constructs a dynamic trustworthiness graph, updating it in real-time with evidential information, and retrieves relevant trust data to augment the reasoning capabilities of Large Language Models (LLMs). We validate our approach through experiments on the multiplayer game "Werewolf," comparing GRATR against baseline LLM and LLM enhanced with Native RAG and Rerank RAG. Our results demonstrate that GRATR surpasses the baseline methods by over 30\% in winning rate, with superior reasoning performance. Moreover, GRATR effectively mitigates LLM hallucinations, such as identity and objective amnesia, and crucially, it renders the reasoning process more transparent and traceable through the use of the trustworthiness graph.
Abstract:The competition focuses on Multiparty Multiobjective Optimization Problems (MPMOPs), where multiple decision makers have conflicting objectives, as seen in applications like UAV path planning. Despite their importance, MPMOPs remain understudied in comparison to conventional multiobjective optimization. The competition aims to address this gap by encouraging researchers to explore tailored modeling approaches. The test suite comprises two parts: problems with common Pareto optimal solutions and Biparty Multiobjective UAV Path Planning (BPMO-UAVPP) problems with unknown solutions. Optimization algorithms for the first part are evaluated using Multiparty Inverted Generational Distance (MPIGD), and the second part is evaluated using Multiparty Hypervolume (MPHV) metrics. The average algorithm ranking across all problems serves as a performance benchmark.