Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Namkug Kim

MedErr-CT: A Visual Question Answering Benchmark for Identifying and Correcting Errors in CT Reports

Jun 24, 2025

Sunggu Kyung, Hyungbin Park, Jinyoung Seo, Jimin Sung, Jihyun Kim, Dongyeong Kim, Wooyoung Jo, Yoojin Nam, Sangah Park, Taehee Kwon(+2 more)

Abstract:Computed Tomography (CT) plays a crucial role in clinical diagnosis, but the growing demand for CT examinations has raised concerns about diagnostic errors. While Multimodal Large Language Models (MLLMs) demonstrate promising comprehension of medical knowledge, their tendency to produce inaccurate information highlights the need for rigorous validation. However, existing medical visual question answering (VQA) benchmarks primarily focus on simple visual recognition tasks, lacking clinical relevance and failing to assess expert-level knowledge. We introduce MedErr-CT, a novel benchmark for evaluating medical MLLMs' ability to identify and correct errors in CT reports through a VQA framework. The benchmark includes six error categories - four vision-centric errors (Omission, Insertion, Direction, Size) and two lexical error types (Unit, Typo) - and is organized into three task levels: classification, detection, and correction. Using this benchmark, we quantitatively assess the performance of state-of-the-art 3D medical MLLMs, revealing substantial variation in their capabilities across different error types. Our benchmark contributes to the development of more reliable and clinically applicable MLLMs, ultimately helping reduce diagnostic errors and improve accuracy in clinical practice. The code and datasets are available at https://github.com/babbu3682/MedErr-CT.

* 14 pages, 5 figures, submitted to CVPR 2025

Via

Access Paper or Ask Questions

The Medical Segmentation Decathlon

Jun 10, 2021

Michela Antonelli, Annika Reinke, Spyridon Bakas, Keyvan Farahani, AnnetteKopp-Schneider, Bennett A. Landman, Geert Litjens, Bjoern Menze, Olaf Ronneberger, Ronald M. Summers(+48 more)

Figure 1 for The Medical Segmentation Decathlon

Figure 2 for The Medical Segmentation Decathlon

Figure 3 for The Medical Segmentation Decathlon

Figure 4 for The Medical Segmentation Decathlon

Abstract:International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical problem. We hypothesized that a method capable of performing well on multiple tasks will generalize well to a previously unseen task and potentially outperform a custom-designed solution. To investigate the hypothesis, we organized the Medical Segmentation Decathlon (MSD) - a biomedical image analysis challenge, in which algorithms compete in a multitude of both tasks and modalities. The underlying data set was designed to explore the axis of difficulties typically encountered when dealing with medical images, such as small data sets, unbalanced labels, multi-site data and small objects. The MSD challenge confirmed that algorithms with a consistent good performance on a set of tasks preserved their good average performance on a different set of previously unseen tasks. Moreover, by monitoring the MSD winner for two years, we found that this algorithm continued generalizing well to a wide range of other clinical problems, further confirming our hypothesis. Three main conclusions can be drawn from this study: (1) state-of-the-art image segmentation algorithms are mature, accurate, and generalize well when retrained on unseen tasks; (2) consistent algorithmic performance across multiple tasks is a strong surrogate of algorithmic generalizability; (3) the training of accurate AI segmentation models is now commoditized to non AI experts.

Via

Access Paper or Ask Questions

Barcode Method for Generative Model Evaluation driven by Topological Data Analysis

Jun 04, 2021

Ryoungwoo Jang, Minjee Kim, Da-in Eun, Kyungjin Cho, Jiyeon Seo, Namkug Kim

Figure 1 for Barcode Method for Generative Model Evaluation driven by Topological Data Analysis

Figure 2 for Barcode Method for Generative Model Evaluation driven by Topological Data Analysis

Figure 3 for Barcode Method for Generative Model Evaluation driven by Topological Data Analysis

Figure 4 for Barcode Method for Generative Model Evaluation driven by Topological Data Analysis

Abstract:Evaluating the performance of generative models in image synthesis is a challenging task. Although the Fr\'echet Inception Distance is a widely accepted evaluation metric, it integrates different aspects (e.g., fidelity and diversity) of synthesized images into a single score and assumes the normality of embedded vectors. Recent methods such as precision-and-recall and its variants such as density-and-coverage have been developed to separate fidelity and diversity based on k-nearest neighborhood methods. In this study, we propose an algorithm named barcode, which is inspired by the topological data analysis and is almost free of assumption and hyperparameter selections. In extensive experiments on real-world datasets as well as theoretical approach on high-dimensional normal samples, it was found that the 'usual' normality assumption of embedded vectors has several drawbacks. The experimental results demonstrate that barcode outperforms other methods in evaluating fidelity and diversity of GAN outputs. Official codes can be found in https://github.com/minjeekim00/Barcode.

Via

Access Paper or Ask Questions

Fully Automated Hand Hygiene Monitoring\\in Operating Room using 3D Convolutional Neural Network

Mar 20, 2020

Minjee Kim, Joonmyeong Choi, Namkug Kim

$Figure 1 for Fully Automated Hand Hygiene Monitoring\\in Operating Room using 3D Convolutional Neural Network$

$Figure 2 for Fully Automated Hand Hygiene Monitoring\\in Operating Room using 3D Convolutional Neural Network$

$Figure 3 for Fully Automated Hand Hygiene Monitoring\\in Operating Room using 3D Convolutional Neural Network$

$Figure 4 for Fully Automated Hand Hygiene Monitoring\\in Operating Room using 3D Convolutional Neural Network$

Abstract:Hand hygiene is one of the most significant factors in preventing hospital acquired infections (HAI) which often be transmitted by medical staffs in contact with patients in the operating room (OR). Hand hygiene monitoring could be important to investigate and reduce the outbreak of infections within the OR. However, an effective monitoring tool for hand hygiene compliance is difficult to develop due to the visual complexity of the OR scene. Recent progress in video understanding with convolutional neural net (CNN) has increased the application of recognition and detection of human actions. Leveraging this progress, we proposed a fully automated hand hygiene monitoring tool of the alcohol-based hand rubbing action of anesthesiologists on OR video using spatio-temporal features with 3D CNN. First, the region of interest (ROI) of anesthesiologists' upper body were detected and cropped. A temporal smoothing filter was applied to the ROIs. Then, the ROIs were given to a 3D CNN and classified into two classes: rubbing hands or other actions. We observed that a transfer learning from Kinetics-400 is beneficial and the optical flow stream was not helpful in our dataset. The final accuracy, precision, recall and F1 score in testing is 0.76, 0.85, 0.65 and 0.74, respectively.

Via

Access Paper or Ask Questions

Automatic Tip Detection of Surgical Instruments in Biportal Endoscopic Spine Surgery

Nov 07, 2019

Sue Min Cho, Young-Gon Kim, Jinhoon Jeong, Ho-jin Lee, Namkug Kim

Figure 1 for Automatic Tip Detection of Surgical Instruments in Biportal Endoscopic Spine Surgery

Figure 2 for Automatic Tip Detection of Surgical Instruments in Biportal Endoscopic Spine Surgery

Figure 3 for Automatic Tip Detection of Surgical Instruments in Biportal Endoscopic Spine Surgery

Figure 4 for Automatic Tip Detection of Surgical Instruments in Biportal Endoscopic Spine Surgery

Abstract:Some endoscopic surgeries require a surgeon to hold the endoscope with one hand and the surgical instruments with the other hand to perform the actual surgery with correct vision. Recent technical advances in deep learning as well as in robotics can introduce robotics to these endoscopic surgeries. This can have numerous advantages by freeing one hand of the surgeon, which will allow the surgeon to use both hands and to use more intricate and sophisticated techniques. Recently, deep learning with convolutional neural network achieves state-of-the-art results in computer vision. Therefore, the aim of this study is to automatically detect the tip of the instrument, localize a point, and evaluate detection accuracy in biportal endoscopic spine surgery. The localized point could be used for the controller's inputs of robotic endoscopy in these types of endoscopic surgeries.

* 7 pages, 6 figures, 3 tables

Via

Access Paper or Ask Questions

False Positive Reduction by Actively Mining Negative Samples for Pulmonary Nodule Detection in Chest Radiographs

Jul 26, 2018

Sejin Park, Woochan Hwang, Kyu Hwan Jung, Joon Beom Seo, Namkug Kim

Figure 1 for False Positive Reduction by Actively Mining Negative Samples for Pulmonary Nodule Detection in Chest Radiographs

Figure 2 for False Positive Reduction by Actively Mining Negative Samples for Pulmonary Nodule Detection in Chest Radiographs

Figure 3 for False Positive Reduction by Actively Mining Negative Samples for Pulmonary Nodule Detection in Chest Radiographs

Abstract:Generating large quantities of quality labeled data in medical imaging is very time consuming and expensive. The performance of supervised algorithms for various tasks on imaging has improved drastically over the years, however the availability of data to train these algorithms have become one of the main bottlenecks for implementation. To address this, we propose a semi-supervised learning method where pseudo-negative labels from unlabeled data are used to further refine the performance of a pulmonary nodule detection network in chest radiographs. After training with the proposed network, the false positive rate was reduced to 0.1266 from 0.4864 while maintaining sensitivity at 0.89.

* Presented at the 2nd SIIM C-MIMI(SIIM Conference on Machine Intelligence in Medical Imaging)

Via

Access Paper or Ask Questions