Abstract:Since the emergence of Large Language Models (LLMs), the challenge of effectively leveraging their potential in healthcare has taken center stage. A critical barrier to using LLMs for extracting insights from unstructured clinical notes lies in the prompt engineering process. Despite its pivotal role in determining task performance, a clear framework for prompt optimization remains absent. Current methods to address this gap take either a manual prompt refinement approach, where domain experts collaborate with prompt engineers to create an optimal prompt, which is time-intensive and difficult to scale, or through employing automatic prompt optimizing approaches, where the value of the input of domain experts is not fully realized. To address this, we propose StructEase, a novel framework that bridges the gap between automation and the input of human expertise in prompt engineering. A core innovation of the framework is SamplEase, an iterative sampling algorithm that identifies high-value cases where expert feedback drives significant performance improvements. This approach minimizes expert intervention, to effectively enhance classification outcomes. This targeted approach reduces labeling redundancy, mitigates human error, and enhances classification outcomes. We evaluated the performance of StructEase using a dataset of de-identified clinical narratives from the US National Electronic Injury Surveillance System (NEISS), demonstrating significant gains in classification performance compared to current methods. Our findings underscore the value of expert integration in LLM workflows, achieving notable improvements in F1 score while maintaining minimal expert effort. By combining transparency, flexibility, and scalability, StructEase sets the foundation for a framework to integrate expert input into LLM workflows in healthcare and beyond.