Abstract:We consider the problem of feature selection in multi-label classification, considering the costs assigned to groups of features. In this task, the goal is to select a subset of features that will be useful for predicting the label vector, but at the same time, the cost associated with the selected features will not exceed the assumed budget. Solving the problem is of great importance in medicine, where we may be interested in predicting various diseases based on groups of features. The groups may be associated with parameters obtained from a certain diagnostic test, such as a blood test. Because diagnostic test costs can be very high, considering cost information when selecting relevant features becomes crucial to reducing the cost of making predictions. We focus on the feature selection method based on information theory. The proposed method consists of two steps. First, we select features sequentially while maximizing conditional mutual information until the budget is exhausted. In the second step, we select additional cost-free features, i.e., those coming from groups that have already been used in previous steps. Limiting the number of added features is possible using the stop rule based on the concept of so-called shadow features, which are randomized counterparts of the original ones. In contrast to existing approaches based on penalized criteria, in our method, we avoid the need for computationally demanding optimization of the penalty parameter. Experiments conducted on the MIMIC medical database show the effectiveness of the method, especially when the assumed budget is limited.
Abstract:The hair and beauty industry is one of the fastest growing industries. This led to the development of various applications, such as virtual hair dyeing or hairstyle translations, to satisfy the need of the customers. Although there are several public hair datasets available for these applications, they consist of limited number of images with low resolution, which restrict their performance on high-quality hair editing. Therefore, we introduce a novel large-scale Korean hairstyle dataset, K-hairstyle, 256,679 with high-resolution images. In addition, K-hairstyle contains various hair attributes annotated by Korean expert hair stylists and hair segmentation masks. We validate the effectiveness of our dataset by leveraging several applications, such as hairstyle translation, and hair classification and hair retrieval. Furthermore, we will release K-hairstyle soon.
Abstract:We present a fully automated method for top-down segmentation of the pulmonary arterial tree in low-dose thoracic CT images. The main basal pulmonary arteries are identified near the lung hilum by searching for candidate vessels adjacent to known airways, identified by our previously reported airway segmentation method. Model cylinders are iteratively fit to the vessels to track them into the lungs. Vessel bifurcations are detected by measuring the rate of change of vessel radii, and child vessels are segmented by initiating new trackers at bifurcation points. Validation is accomplished using our novel sparse surface (SS) evaluation metric. The SS metric was designed to quantify the magnitude of the segmentation error per vessel while significantly decreasing the manual marking burden for the human user. A total of 210 arteries and 205 veins were manually marked across seven test cases. 134/210 arteries were correctly segmented, with a specificity for arteries of 90%, and average segmentation error of 0.15 mm. This fully-automated segmentation is a promising method for improving lung nodule detection in low-dose CT screening scans, by separating vessels from surrounding iso-intensity objects.