Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinyu Liang

ER2Score: LLM-based Explainable and Customizable Metric for Assessing Radiology Reports with Reward-Control Loss

Nov 26, 2024

Yunyi Liu, Yingshu Li, Zhanyu Wang, Xinyu Liang, Lingqiao Liu, Lei Wang, Luping Zhou

Figure 1 for ER2Score: LLM-based Explainable and Customizable Metric for Assessing Radiology Reports with Reward-Control Loss

Figure 2 for ER2Score: LLM-based Explainable and Customizable Metric for Assessing Radiology Reports with Reward-Control Loss

Figure 3 for ER2Score: LLM-based Explainable and Customizable Metric for Assessing Radiology Reports with Reward-Control Loss

Figure 4 for ER2Score: LLM-based Explainable and Customizable Metric for Assessing Radiology Reports with Reward-Control Loss

Abstract:Automated radiology report generation (R2Gen) has advanced significantly, introducing challenges in accurate evaluation due to its complexity. Traditional metrics often fall short by relying on rigid word-matching or focusing only on pathological entities, leading to inconsistencies with human assessments. To bridge this gap, we introduce ER2Score, an automatic evaluation metric designed specifically for R2Gen. Our metric utilizes a reward model, guided by our margin-based reward enforcement loss, along with a tailored training data design that enables customization of evaluation criteria to suit user-defined needs. It not only scores reports according to user-specified criteria but also provides detailed sub-scores, enhancing interpretability and allowing users to adjust the criteria between different aspects of reports. Leveraging GPT-4, we designed an easy-to-use data generation pipeline, enabling us to produce extensive training data based on two distinct scoring systems, each containing reports of varying quality along with corresponding scores. These GPT-generated reports are then paired as accepted and rejected samples through our pairing rule to train an LLM towards our fine-grained reward model, which assigns higher rewards to the report with high quality. Our reward-control loss enables this model to simultaneously output multiple individual rewards corresponding to the number of evaluation criteria, with their summation as our final ER2Score. Our experiments demonstrate ER2Score's heightened correlation with human judgments and superior performance in model selection compared to traditional metrics. Notably, our model provides both an overall score and individual scores for each evaluation item, enhancing interpretability. We also demonstrate its flexible training across various evaluation systems.

Via

Access Paper or Ask Questions

Synthetic Data Generation for Residential Load Patterns via Recurrent GAN and Ensemble Method

Oct 20, 2024

Xinyu Liang, Ziheng Wang, Hao Wang

Abstract:Generating synthetic residential load data that can accurately represent actual electricity consumption patterns is crucial for effective power system planning and operation. The necessity for synthetic data is underscored by the inherent challenges associated with using real-world load data, such as privacy considerations and logistical complexities in large-scale data collection. In this work, we tackle the above-mentioned challenges by developing the Ensemble Recurrent Generative Adversarial Network (ERGAN) framework to generate high-fidelity synthetic residential load data. ERGAN leverages an ensemble of recurrent Generative Adversarial Networks, augmented by a loss function that concurrently takes into account adversarial loss and differences between statistical properties. Our developed ERGAN can capture diverse load patterns across various households, thereby enhancing the realism and diversity of the synthetic data generated. Comprehensive evaluations demonstrate that our method consistently outperforms established benchmarks in the synthetic generation of residential load data across various performance metrics including diversity, similarity, and statistical measures. The findings confirm the potential of ERGAN as an effective tool for energy applications requiring synthetic yet realistic load data. We also make the generated synthetic residential load patterns publicly available.

* IEEE Transactions on Instrumentation and Measurement, 2024
* 12 pages

Via

Access Paper or Ask Questions

MRScore: Evaluating Radiology Report Generation with LLM-based Reward System

Apr 27, 2024

Yunyi Liu, Zhanyu Wang, Yingshu Li, Xinyu Liang, Lingqiao Liu, Lei Wang, Luping Zhou

Abstract:In recent years, automated radiology report generation has experienced significant growth. This paper introduces MRScore, an automatic evaluation metric tailored for radiology report generation by leveraging Large Language Models (LLMs). Conventional NLG (natural language generation) metrics like BLEU are inadequate for accurately assessing the generated radiology reports, as systematically demonstrated by our observations within this paper. To address this challenge, we collaborated with radiologists to develop a framework that guides LLMs for radiology report evaluation, ensuring alignment with human analysis. Our framework includes two key components: i) utilizing GPT to generate large amounts of training data, i.e., reports with different qualities, and ii) pairing GPT-generated reports as accepted and rejected samples and training LLMs to produce MRScore as the model reward. Our experiments demonstrate MRScore's higher correlation with human judgments and superior performance in model selection compared to traditional metrics. Our code and datasets will be available on GitHub.

Via

Access Paper or Ask Questions

A Comprehensive Study of GPT-4V's Multimodal Capabilities in Medical Imaging

Nov 06, 2023

Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lingqiao Liu, Lei Wang, Leyang Cui, Zhaopeng Tu, Longyue Wang, Luping Zhou

Abstract:This paper presents a comprehensive evaluation of GPT-4V's capabilities across diverse medical imaging tasks, including Radiology Report Generation, Medical Visual Question Answering (VQA), and Visual Grounding. While prior efforts have explored GPT-4V's performance in medical image analysis, to the best of our knowledge, our study represents the first quantitative evaluation on publicly available benchmarks. Our findings highlight GPT-4V's potential in generating descriptive reports for chest X-ray images, particularly when guided by well-structured prompts. Meanwhile, its performance on the MIMIC-CXR dataset benchmark reveals areas for improvement in certain evaluation metrics, such as CIDEr. In the domain of Medical VQA, GPT-4V demonstrates proficiency in distinguishing between question types but falls short of the VQA-RAD benchmark in terms of accuracy. Furthermore, our analysis finds the limitations of conventional evaluation metrics like the BLEU scores, advocating for the development of more semantically robust assessment methods. In the field of Visual Grounding, GPT-4V exhibits preliminary promise in recognizing bounding boxes, but its precision is lacking, especially in identifying specific medical organs and signs. Our evaluation underscores the significant potential of GPT-4V in the medical imaging domain, while also emphasizing the need for targeted refinements to fully unlock its capabilities.

Via

Access Paper or Ask Questions

Hybrid Transformer-RNN Architecture for Household Occupancy Detection Using Low-Resolution Smart Meter Data

Aug 27, 2023

Xinyu Liang, Hao Wang

Abstract:Residential occupancy detection has become an enabling technology in today's urbanized world for various smart home applications, such as building automation, energy management, and improved security and comfort. Digitalization of the energy system provides smart meter data that can be used for occupancy detection in a non-intrusive manner without causing concerns regarding privacy and data security. In particular, deep learning techniques make it possible to infer occupancy from low-resolution smart meter data, such that the need for accurate occupancy detection with privacy preservation can be achieved. Our work is thus motivated to develop a privacy-aware and effective model for residential occupancy detection in contemporary living environments. Our model aims to leverage the advantages of both recurrent neural networks (RNNs), which are adept at capturing local temporal dependencies, and transformers, which are effective at handling global temporal dependencies. Our designed hybrid transformer-RNN model detects residential occupancy using hourly smart meter data, achieving an accuracy of nearly 92\% across households with diverse profiles. We validate the effectiveness of our method using a publicly accessible dataset and demonstrate its performance by comparing it with state-of-the-art models, including attention-based occupancy detection methods.

* IEEE IECON 2023 (The 49th Annual Conference of the IEEE Industrial Electronics Society)

Via

Access Paper or Ask Questions