Abstract:Automating high-volume unstructured data processing is essential for operational efficiency. Optical Character Recognition (OCR) is critical but often struggles with accuracy and efficiency in complex layouts and ambiguous text. These challenges are especially pronounced in large-scale tasks requiring both speed and precision. This paper introduces LMV-RPA, a Large Model Voting-based Robotic Process Automation system to enhance OCR workflows. LMV-RPA integrates outputs from OCR engines such as Paddle OCR, Tesseract OCR, Easy OCR, and DocTR with Large Language Models (LLMs) like LLaMA 3 and Gemini-1.5-pro. Using a majority voting mechanism, it processes OCR outputs into structured JSON formats, improving accuracy, particularly in complex layouts. The multi-phase pipeline processes text extracted by OCR engines through LLMs, combining results to ensure the most accurate outputs. LMV-RPA achieves 99 percent accuracy in OCR tasks, surpassing baseline models with 94 percent, while reducing processing time by 80 percent. Benchmark evaluations confirm its scalability and demonstrate that LMV-RPA offers a faster, more reliable, and efficient solution for automating large-scale document processing tasks.
Abstract:The potential for augmenting the segmentation of brain tumors through the use of few-shot learning is vast. Although several deep learning networks (DNNs) demonstrate promising results in terms of segmentation, they require a substantial quantity of training data in order to produce suitable outcomes. Furthermore, a major issue faced by most of these models is their ability to perform well when faced with unseen classes. To address these challenges, we propose a one-shot learning model for segmenting brain tumors in magnetic resonance images (MRI) of the brain, based on a single prototype similarity score. Leveraging the recently developed techniques of few-shot learning, which involve the utilization of support and query sets of images for training and testing purposes, we strive to obtain a definitive tumor region by focusing on slices that contain foreground classes. This approach differs from other recent DNNs that utilize the entire set of images. The training process for this model is carried out iteratively, with each iteration involving the selection of random slices that contain foreground classes from randomly sampled data as the query set, along with a different random slice from the same sample as the support set. In order to distinguish the query images from the class prototypes, we employ a metric learning-based approach that relies on non-parametric thresholds. We employ the multimodal Brain Tumor Image Segmentation (BraTS) 2021 dataset, which comprises 60 training images and 350 testing images. The effectiveness of the model is assessed using the mean dice score and mean Intersection over Union (IoU) score.