Abstract:We explore using Large Language Models (LLMs) to generate application code that automates health insurance processes from text-based policies. We target blockchain-based smart contracts as they offer immutability, verifiability, scalability, and a trustless setting: any number of parties can use the smart contracts, and they need not have previously established trust relationships with each other. Our methodology generates outputs at increasing levels of technical detail: (1) textual summaries, (2) declarative decision logic, and (3) smart contract code with unit tests. We ascertain LLMs are good at the task (1), and the structured output is useful to validate tasks (2) and (3). Declarative languages (task 2) are often used to formalize healthcare policies, but their execution on blockchain is non-trivial. Hence, task (3) attempts to directly automate the process using smart contracts. To assess the LLM output, we propose completeness, soundness, clarity, syntax, and functioning code as metrics. Our evaluation employs three health insurance policies (scenarios) with increasing difficulty from Medicare's official booklet. Our evaluation uses GPT-3.5 Turbo, GPT-3.5 Turbo 16K, GPT-4, GPT-4 Turbo and CodeLLaMA. Our findings confirm that LLMs perform quite well in generating textual summaries. Although outputs from tasks (2)-(3) are useful starting points, they require human oversight: in multiple cases, even "runnable" code will not yield sound results; the popularity of the target language affects the output quality; and more complex scenarios still seem a bridge too far. Nevertheless, our experiments demonstrate the promise of LLMs for translating textual process descriptions into smart contracts.
Abstract:Finding preferences expressed in natural language is an important but challenging task. State-of-the-art(SotA) methods leverage transformer-based models such as BERT, RoBERTa, etc. and graph neural architectures such as graph attention networks. Since Large Language Models (LLMs) are equipped to deal with larger context lengths and have much larger model sizes than the transformer-based model, we investigate their ability to classify comparative text directly. This work aims to serve as a first step towards using LLMs for the CPC task. We design and conduct a set of experiments that format the classification task into an input prompt for the LLM and a methodology to get a fixed-format response that can be automatically evaluated. Comparing performances with existing methods, we see that pre-trained LLMs are able to outperform the previous SotA models with no fine-tuning involved. Our results show that the LLMs can consistently outperform the SotA when the target text is large -- i.e. composed of multiple sentences --, and are still comparable to the SotA performance in shorter text. We also find that few-shot learning yields better performance than zero-shot learning.