Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tristan Cazenave

APA

Adaptive Bias Generalized Rollout Policy Adaptation on the Flexible Job-Shop Scheduling Problem

May 13, 2025

Lotfi Kobrosly, Marc-Emmanuel Coupvent des Graviers, Christophe Guettier, Tristan Cazenave

Abstract:The Flexible Job-Shop Scheduling Problem (FJSSP) is an NP-hard combinatorial optimization problem, with several application domains, especially for manufacturing purposes. The objective is to efficiently schedule multiple operations on dissimilar machines. These operations are gathered into jobs, and operations pertaining to the same job need to be scheduled sequentially. Different methods have been previously tested to solve this problem, such as Constraint Solving, Tabu Search, Genetic Algorithms, or Monte Carlo Tree Search (MCTS). We propose a novel algorithm derived from the Generalized Nested Rollout Policy Adaptation, developed to solve the FJSSP. We report encouraging experimental results, as our algorithm performs better than other MCTS-based approaches, even if makespans obtained on large instances are still far from known upper bounds.

* The 19th Learning and Intelligent OptimizatioN Conference, LION19 2025

Via

Access Paper or Ask Questions

On some improvements to Unbounded Minimax

May 07, 2025

Quentin Cohen-Solal, Tristan Cazenave

Abstract:This paper presents the first experimental evaluation of four previously untested modifications of Unbounded Best-First Minimax algorithm. This algorithm explores the game tree by iteratively expanding the most promising sequences of actions based on the current partial game tree. We first evaluate the use of transposition tables, which convert the game tree into a directed acyclic graph by merging duplicate states. Second, we compare the original algorithm by Korf & Chickering with the variant proposed by Cohen-Solal, which differs in its backpropagation strategy: instead of stopping when a stable value is encountered, it updates values up to the root. This change slightly improves performance when value ties or transposition tables are involved. Third, we assess replacing the exact terminal evaluation function with the learned heuristic function. While beneficial when exact evaluations are costly, this modification reduces performance in inexpensive settings. Finally, we examine the impact of the completion technique that prioritizes resolved winning states and avoids resolved losing states. This technique also improves performance. Overall, our findings highlight how targeted modifications can enhance the efficiency of Unbounded Best-First Minimax.

Via

Access Paper or Ask Questions

Eterna is Solved

May 04, 2025

Tristan Cazenave

Abstract:RNA design consists of discovering a nucleotide sequence that folds into a target secondary structure. It is useful for synthetic biology, medicine, and nanotechnology. We propose Montparnasse, a Multi Objective Generalized Nested Rollout Policy Adaptation with Limited Repetition (MOGNRPALR) RNA design algorithm. It solves the Eterna benchmark.

Via

Access Paper or Ask Questions

BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters

Apr 29, 2025

Baz Roland, Kristina Malyseva, Anna Pappa, Tristan Cazenave

Abstract:This project introduces BrAIcht, an AI conversational agent that creates dialogues in the distinctive style of the famous German playwright Bertolt Brecht. BrAIcht is fine-tuned using German LeoLM, a large language model with 7 billion parameters and a modified version of the base Llama2 suitable for German language tasks. For fine-tuning, 29 plays of Bertolt Brecht and 907 of other German plays that are stylistically similar to Bertolt Brecht are used to form a more di-erse dataset. Due to the limited memory capacity, a parameterefficient fine-tuning technique called QLoRA is implemented to train the large language model. The results, based on BLEU score and perplexity, show very promising performance of BrAIcht in generating dialogues in the style of Bertolt Brecht.

* Generative Art Conference - GA2024, Generative Art and Design Lab, Argenia Association, Roma, Italy, Dec 2024, Venice, Italy. pp.290-296

Via

Access Paper or Ask Questions

Monte Carlo Graph Coloring

Apr 04, 2025

Tristan Cazenave, Benjamin Negrevergne, Florian Sikora

Abstract:Graph Coloring is probably one of the most studied and famous problem in graph algorithms. Exact methods fail to solve instances with more than few hundred vertices, therefore, a large number of heuristics have been proposed. Nested Monte Carlo Search (NMCS) and Nested Rollout Policy Adaptation (NRPA) are Monte Carlo search algorithms for single player games. Surprisingly, few work has been dedicated to evaluating Monte Carlo search algorithms to combinatorial graph problems. In this paper we expose how to efficiently apply Monte Carlo search to Graph Coloring and compare this approach to existing ones.

* Monte Carlo Search (MCS) 2020. Springer. Vol 1379 of Communications in Computer and Information Science. pp 100--115

Via

Access Paper or Ask Questions

Enhancing Reinforcement Learning Through Guided Search

Aug 19, 2024

Jérôme Arjonilla, Abdallah Saffidine, Tristan Cazenave

Abstract:With the aim of improving performance in Markov Decision Problem in an Off-Policy setting, we suggest taking inspiration from what is done in Offline Reinforcement Learning (RL). In Offline RL, it is a common practice during policy learning to maintain proximity to a reference policy to mitigate uncertainty, reduce potential policy errors, and help improve performance. We find ourselves in a different setting, yet it raises questions about whether a similar concept can be applied to enhance performance ie, whether it is possible to find a guiding policy capable of contributing to performance improvement, and how to incorporate it into our RL agent. Our attention is particularly focused on algorithms based on Monte Carlo Tree Search (MCTS) as a guide.MCTS renowned for its state-of-the-art capabilities across various domains, catches our interest due to its ability to converge to equilibrium in single-player and two-player contexts. By harnessing the power of MCTS as a guide for our RL agent, we observed a significant performance improvement, surpassing the outcomes achieved by utilizing each method in isolation. Our experiments were carried out on the Atari 100k benchmark.

* Accepted Paper at ECAI 2024; Extended Version

Via

Access Paper or Ask Questions

LLMs can Schedule

Aug 13, 2024

Henrik Abgaryan, Ararat Harutyunyan, Tristan Cazenave

Abstract:The job shop scheduling problem (JSSP) remains a significant hurdle in optimizing production processes. This challenge involves efficiently allocating jobs to a limited number of machines while minimizing factors like total processing time or job delays. While recent advancements in artificial intelligence have yielded promising solutions, such as reinforcement learning and graph neural networks, this paper explores the potential of Large Language Models (LLMs) for JSSP. We introduce the very first supervised 120k dataset specifically designed to train LLMs for JSSP. Surprisingly, our findings demonstrate that LLM-based scheduling can achieve performance comparable to other neural approaches. Furthermore, we propose a sampling method that enhances the effectiveness of LLMs in tackling JSSP.

Via

Access Paper or Ask Questions

Perfect Information Monte Carlo with Postponing Reasoning

Aug 05, 2024

Jérôme Arjonilla, Abdallah Saffidine, Tristan Cazenave

Abstract:Imperfect information games, such as Bridge and Skat, present challenges due to state-space explosion and hidden information, posing formidable obstacles for search algorithms. Determinization-based algorithms offer a resolution by sampling hidden information and solving the game in a perfect information setting, facilitating rapid and effective action estimation. However, transitioning to perfect information introduces challenges, notably one called strategy fusion.This research introduces `Extended Perfect Information Monte Carlo' (EPIMC), an online algorithm inspired by the state-of-the-art determinization-based approach Perfect Information Monte Carlo (PIMC). EPIMC enhances the capabilities of PIMC by postponing the perfect information resolution, reducing alleviating issues related to strategy fusion. However, the decision to postpone the leaf evaluator introduces novel considerations, such as the interplay between prior levels of reasoning and the newly deferred resolution. In our empirical analysis, we investigate the performance of EPIMC across a range of games, with a particular focus on those characterized by varying degrees of strategy fusion. Our results demonstrate notable performance enhancements, particularly in games where strategy fusion significantly impacts gameplay. Furthermore, our research contributes to the theoretical foundation of determinization-based algorithms addressing challenges associated with strategy fusion.%, thereby enhancing our understanding of these algorithms within the context of imperfect information game scenarios.

* Accepted in IEEE Conference on Games (CoG) 2024 + Appendix

Via

Access Paper or Ask Questions

Mixture of Public and Private Distributions in Imperfect Information Games

May 23, 2024

Jérôme Arjonilla, Abdallah Saffidine, Tristan Cazenave

Figure 1 for Mixture of Public and Private Distributions in Imperfect Information Games

Figure 2 for Mixture of Public and Private Distributions in Imperfect Information Games

Figure 3 for Mixture of Public and Private Distributions in Imperfect Information Games

Figure 4 for Mixture of Public and Private Distributions in Imperfect Information Games

Abstract:In imperfect information games (e.g. Bridge, Skat, Poker), one of the fundamental considerations is to infer the missing information while at the same time avoiding the disclosure of private information. Disregarding the issue of protecting private information can lead to a highly exploitable performance. Yet, excessive attention to it leads to hesitations that are no longer consistent with our private information. In our work, we show that to improve performance, one must choose whether to use a player's private information. We extend our work by proposing a new belief distribution depending on the amount of private and public information desired. We empirically demonstrate an increase in performance and, with the aim of further improving performance, the new distribution should be used according to the position in the game. Our experiments have been done on multiple benchmarks and in multiple determinization-based algorithms (PIMC and IS-MCTS).

* 2023 IEEE Conference on Games (CoG)
* Accepted in CoG 2023

Via

Access Paper or Ask Questions

**Deep Reinforcement Learning for 5*5 Multiplayer Go**

May 23, 2024

Brahim Driss, Jérôme Arjonilla, Hui Wang, Abdallah Saffidine, Tristan Cazenave

Abstract:In recent years, much progress has been made in computer Go and most of the results have been obtained thanks to search algorithms (Monte Carlo Tree Search) and Deep Reinforcement Learning (DRL). In this paper, we propose to use and analyze the latest algorithms that use search and DRL (AlphaZero and Descent algorithms) to automatically learn to play an extended version of the game of Go with more than two players. We show that using search and DRL we were able to improve the level of play, even though there are more than two players.

* International Conference on the Applications of Evolutionary Computation (Part of EvoStar), 2023, 753--764
* Accepted in EvoApps at Evostar2023

Via

Access Paper or Ask Questions