Abstract:Recent advances in large language models (LLMs) show potential for clinical applications, such as clinical decision support and trial recommendations. However, the GPT-4 LLM predicts an excessive number of ICD codes for medical coding tasks, leading to high recall but low precision. To tackle this challenge, we introduce LLM-codex, a two-stage approach to predict ICD codes that first generates evidence proposals using an LLM and then employs an LSTM-based verification stage. The LSTM learns from both the LLM's high recall and human expert's high precision, using a custom loss function. Our model is the only approach that simultaneously achieves state-of-the-art results in medical coding accuracy, accuracy on rare codes, and sentence-level evidence identification to support coding decisions without training on human-annotated evidence according to experiments on the MIMIC dataset.
Abstract:Hyperspectral data consists of large number of features which require sophisticated analysis to be extracted. A popular approach to reduce computational cost, facilitate information representation and accelerate knowledge discovery is to eliminate bands that do not improve the classification and analysis methods being applied. In particular, algorithms that perform band elimination should be designed to take advantage of the specifics of the classification method being used. This paper employs a recently proposed filter-feature-selection algorithm based on minimizing a tight bound on the VC dimension. We have successfully applied this algorithm to determine a reasonable subset of bands without any user-defined stopping criteria on widely used hyperspectral images and demonstrate that this method outperforms state-of-the-art methods in terms of both sparsity of feature set as well as accuracy of classification.\end{abstract}
Abstract:The Vapnik-Chervonenkis (VC) dimension measures the complexity of a learning machine, and a low VC dimension leads to good generalization. The recently proposed Minimal Complexity Machine (MCM) learns a hyperplane classifier by minimizing an exact bound on the VC dimension. This paper extends the MCM classifier to the fuzzy domain. The use of a fuzzy membership is known to reduce the effect of outliers, and to reduce the effect of noise on learning. Experimental results show, that on a number of benchmark datasets, the the fuzzy MCM classifier outperforms SVMs and the conventional MCM in terms of generalization, and that the fuzzy MCM uses fewer support vectors. On several benchmark datasets, the fuzzy MCM classifier yields excellent test set accuracies while using one-tenth the number of support vectors used by SVMs.