Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ellie Shin

How much data is needed to train a medical image deep learning system to achieve necessary high accuracy?

Jan 07, 2016

Junghwan Cho, Kyewook Lee, Ellie Shin, Garry Choy, Synho Do

Figure 1 for How much data is needed to train a medical image deep learning system to achieve necessary high accuracy?

Figure 2 for How much data is needed to train a medical image deep learning system to achieve necessary high accuracy?

Figure 3 for How much data is needed to train a medical image deep learning system to achieve necessary high accuracy?

Figure 4 for How much data is needed to train a medical image deep learning system to achieve necessary high accuracy?

Abstract:The use of Convolutional Neural Networks (CNN) in natural image classification systems has produced very impressive results. Combined with the inherent nature of medical images that make them ideal for deep-learning, further application of such systems to medical image classification holds much promise. However, the usefulness and potential impact of such a system can be completely negated if it does not reach a target accuracy. In this paper, we present a study on determining the optimum size of the training data set necessary to achieve high classification accuracy with low variance in medical image classification systems. The CNN was applied to classify axial Computed Tomography (CT) images into six anatomical classes. We trained the CNN using six different sizes of training data set (5, 10, 20, 50, 100, and 200) and then tested the resulting system with a total of 6000 CT images. All images were acquired from the Massachusetts General Hospital (MGH) Picture Archiving and Communication System (PACS). Using this data, we employ the learning curve approach to predict classification accuracy at a given training sample size. Our research will present a general methodology for determining the training data set size necessary to achieve a certain target classification accuracy that can be easily applied to other problems within such systems.

Via

Access Paper or Ask Questions