Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Doyle

Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation

Nov 22, 2024

Colin Diggs, Michael Doyle, Amit Madan, Siggy Scott, Emily Escamilla, Jacob Zimmer, Naveed Nekoo, Paul Ursino, Michael Bartholf, Zachary Robin(+11 more)

Abstract:Legacy software systems, written in outdated languages like MUMPS and mainframe assembly, pose challenges in efficiency, maintenance, staffing, and security. While LLMs offer promise for modernizing these systems, their ability to understand legacy languages is largely unknown. This paper investigates the utilization of LLMs to generate documentation for legacy code using two datasets: an electronic health records (EHR) system in MUMPS and open-source applications in IBM mainframe Assembly Language Code (ALC). We propose a prompting strategy for generating line-wise code comments and a rubric to evaluate their completeness, readability, usefulness, and hallucination. Our study assesses the correlation between human evaluations and automated metrics, such as code complexity and reference-based metrics. We find that LLM-generated comments for MUMPS and ALC are generally hallucination-free, complete, readable, and useful compared to ground-truth comments, though ALC poses challenges. However, no automated metrics strongly correlate with comment quality to predict or measure LLM performance. Our findings highlight the limitations of current automated measures and the need for better evaluation metrics for LLM-generated documentation in legacy systems.

* Abbreviated version submitted to LLM4Code 2025 (a workshop co-located with ICSE 2025), 13 pages, 3 figures

Via

Access Paper or Ask Questions

Testing the Effect of Code Documentation on Large Language Model Code Understanding

Apr 03, 2024

William Macke, Michael Doyle

Abstract:Large Language Models (LLMs) have demonstrated impressive abilities in recent years with regards to code generation and understanding. However, little work has investigated how documentation and other code properties affect an LLM's ability to understand and generate code or documentation. We present an empirical analysis of how underlying properties of code or documentation can affect an LLM's capabilities. We show that providing an LLM with "incorrect" documentation can greatly hinder code understanding, while incomplete or missing documentation does not seem to significantly affect an LLM's ability to understand code.

* 7 pages, 5 figures, 2 tables. Accepted as a Findings paper in the "Generation" track to NAACL 2024. MITRE Public Release Case Number 23-4132

Via

Access Paper or Ask Questions

Reinforcement Learning for Wildfire Mitigation in Simulated Disaster Environments

Nov 27, 2023

Alexander Tapley, Marissa Dotter, Michael Doyle, Aidan Fennelly, Dhanuj Gandikota, Savanna Smith, Michael Threet, Tim Welsh

Figure 1 for Reinforcement Learning for Wildfire Mitigation in Simulated Disaster Environments

Figure 2 for Reinforcement Learning for Wildfire Mitigation in Simulated Disaster Environments

Figure 3 for Reinforcement Learning for Wildfire Mitigation in Simulated Disaster Environments

Figure 4 for Reinforcement Learning for Wildfire Mitigation in Simulated Disaster Environments

Abstract:Climate change has resulted in a year over year increase in adverse weather and weather conditions which contribute to increasingly severe fire seasons. Without effective mitigation, these fires pose a threat to life, property, ecology, cultural heritage, and critical infrastructure. To better prepare for and react to the increasing threat of wildfires, more accurate fire modelers and mitigation responses are necessary. In this paper, we introduce SimFire, a versatile wildland fire projection simulator designed to generate realistic wildfire scenarios, and SimHarness, a modular agent-based machine learning wrapper capable of automatically generating land management strategies within SimFire to reduce the overall damage to the area. Together, this publicly available system allows researchers and practitioners the ability to emulate and assess the effectiveness of firefighter interventions and formulate strategic plans that prioritize value preservation and resource allocation optimization. The repositories are available for download at https://github.com/mitrefireline.

* 12 pages, 4 figures including Appendices (A, B). Accepted as a paper in the Proposals track at the "Tackling Climate Change with Machine Learning" workshop at NeurIPS 2023. MITRE Public Release Case Number 23-3920

Via

Access Paper or Ask Questions