Abstract:Magnetic resonance spectroscopic imaging is a widely available imaging modality that can non-invasively provide a metabolic profile of the tissue of interest, yet is challenging to integrate clinically. One major reason is the expensive, expert data processing and analysis that is required. Using machine learning to predict MRS-related quantities offers avenues around this problem, but deep learning models bring their own challenges, especially model trust. Current research trends focus primarily on mean error metrics, but comprehensive precision metrics are also needed, e.g. standard deviations, confidence intervals, etc.. This work highlights why more comprehensive error characterization is important and how to improve the precision of CNNs for spectral modeling, a quantitative task. The results highlight advantages and trade-offs of these techniques that should be considered when addressing such regression tasks with CNNs. Detailed insights into the underlying mechanisms of each technique, and how they interact with other techniques, are discussed in depth.
Abstract:Radiologists today play a key role in making diagnostic decisions and labeling images for training A.I. algorithms. Low inter-reader reliability (IRR) can be seen between experts when interpreting challenging cases. While teams-based decisions are known to outperform individual decisions, inter-personal biases often creep up in group interactions which limit non-dominant participants from expressing true opinions. To overcome the dual problems of low consensus and inter-personal bias, we explored a solution modeled on biological swarms of bees. Two separate cohorts; three radiologists and five radiology residents collaborated on a digital swarm platform in real time and in a blinded fashion, grading meniscal lesions on knee MR exams. These consensus votes were benchmarked against clinical (arthroscopy) and radiological (senior-most radiologist) observations. The IRR of the consensus votes was compared to the IRR of the majority and most confident votes of the two cohorts.The radiologist cohort saw an improvement of 23% in IRR of swarm votes over majority vote. Similar improvement of 23% in IRR in 3-resident swarm votes over majority vote, was observed. The 5-resident swarm had an even higher improvement of 32% in IRR over majority vote. Swarm consensus votes also improved specificity by up to 50%. The swarm consensus votes outperformed individual and majority vote decisions in both the radiologists and resident cohorts. The 5-resident swarm had higher IRR than 3-resident swarm indicating positive effect of increased swarm size. The attending and resident swarms also outperformed predictions from a state-of-the-art A.I. algorithm. Utilizing a digital swarm platform improved agreement and allows participants to express judgement free intent, resulting in superior clinical performance and robust A.I. training labels.