Abstract:In recent years, large language models (LLMs) have shown remarkable capabilities in various artificial intelligence problems. However, they fail to plan reliably, even when prompted with a detailed definition of the planning task. Attempts to improve their planning capabilities, such as chain-of-thought prompting, fine-tuning, and explicit "reasoning" still yield incorrect plans and usually fail to generalize to larger tasks. In this paper, we show how to use LLMs to generate correct plans, even for out-of-distribution tasks of increasing size. For a given planning domain, we ask an LLM to generate several domain-dependent heuristic functions in the form of Python code, evaluate them on a set of training tasks within a greedy best-first search, and choose the strongest one. The resulting LLM-generated heuristics solve many more unseen test tasks than state-of-the-art domain-independent heuristics for classical planning. They are even competitive with the strongest learning algorithm for domain-dependent planning. These findings are especially remarkable given that our proof-of-concept implementation is based on an unoptimized Python planner and the baselines all build upon highly optimized C++ code. In some domains, the LLM-generated heuristics expand fewer states than the baselines, revealing that they are not only efficiently computable, but sometimes even more informative than the state-of-the-art heuristics. Overall, our results show that sampling a set of planning heuristic function programs can significantly improve the planning capabilities of LLMs.
Abstract:One key decision for heuristic search algorithms is how to balance exploration and exploitation. In classical planning, novelty search has come out as the most successful approach in this respect. The idea is to favor states that contain previously unseen facts when searching for a plan. This is done by maintaining a record of the tuples of facts observed in previous states. Then the novelty of a state is the size of the smallest previously unseen tuple. The most successful version of novelty search is best-first width search (BFWS), which combines novelty measures with heuristic estimates. An orthogonal approach to balance exploration-exploitation is to use several open-lists. These open-lists are ordered using different heuristic estimates, which diversify the information used in the search. The search algorithm then alternates between these open-lists, trying to exploit these different estimates. This is the approach used by LAMA, a classical planner that, a decade after its release, is still considered state-of-the-art in agile planning. In this paper, we study how to combine LAMA and BFWS. We show that simply adding the strongest open-list used in BFWS to LAMA harms performance. However, we show that combining only parts of each planner leads to a new state-of-the-art agile planner.