Abstract:Data is crucial for evidence-based policymaking and enhancing public services, including those at the Ministry of Finance of the Republic of Indonesia. However, the complexity and dynamic nature of governmental financial data and regulations can hinder decision-making. This study investigates the potential of Large Language Models (LLMs) to address these challenges, focusing on Indonesia's financial data and regulations. While LLMs are effective in the financial sector, their use in the public sector in Indonesia is unexplored. This study undertakes an iterative process to develop KemenkeuGPT using the LangChain with Retrieval-Augmented Generation (RAG), prompt engineering and fine-tuning. The dataset from 2003 to 2023 was collected from the Ministry of Finance, Statistics Indonesia and the International Monetary Fund (IMF). Surveys and interviews with Ministry officials informed, enhanced and fine-tuned the model. We evaluated the model using human feedback, LLM-based evaluation and benchmarking. The model's accuracy improved from 35% to 61%, with correctness increasing from 48% to 64%. The Retrieval-Augmented Generation Assessment (RAGAS) framework showed that KemenkeuGPT achieved 44% correctness with 73% faithfulness, 40% precision and 60% recall, outperforming several other base models. An interview with an expert from the Ministry of Finance indicated that KemenkeuGPT has the potential to become an essential tool for decision-making. These results are expected to improve with continuous human feedback.