Abstract:Vision-language models (VLMs) are increasingly important in medical applications; however, their evaluation in dermatology remains limited by datasets that focus primarily on image-level classification tasks such as lesion recognition. While valuable for recognition, such datasets cannot assess the full visual understanding, language grounding, and clinical reasoning capabilities of multimodal models. Visual question answering (VQA) benchmarks are required to evaluate how models interpret dermatological images, reason over fine-grained morphology, and generate clinically meaningful descriptions. We introduce DermaBench, a clinician-annotated dermatology VQA benchmark built on the Diverse Dermatology Images (DDI) dataset. DermaBench comprises 656 clinical images from 570 unique patients spanning Fitzpatrick skin types I-VI. Using a hierarchical annotation schema with 22 main questions (single-choice, multi-choice, and open-ended), expert dermatologists annotated each image for diagnosis, anatomic site, lesion morphology, distribution, surface features, color, and image quality, together with open-ended narrative descriptions and summaries, yielding approximately 14.474 VQA-style annotations. DermaBench is released as a metadata-only dataset to respect upstream licensing and is publicly available at Harvard Dataverse.
Abstract:Foundation models have transformed medical image analysis by providing robust feature representations that reduce the need for large-scale task-specific training. However, current benchmarks in dermatology often reduce the complex diagnostic taxonomy to flat, binary classification tasks, such as distinguishing melanoma from benign nevi. This oversimplification obscures a model's ability to perform fine-grained differential diagnoses, which is critical for clinical workflow integration. This study evaluates the utility of embeddings derived from ten foundation models, spanning general computer vision, general medical imaging, and dermatology-specific domains, for hierarchical skin lesion classification. Using the DERM12345 dataset, which comprises 40 lesion subclasses, we calculated frozen embeddings and trained lightweight adapter models using a five-fold cross-validation. We introduce a hierarchical evaluation framework that assesses performance across four levels of clinical granularity: 40 Subclasses, 15 Main Classes, 2 and 4 Superclasses, and Binary Malignancy. Our results reveal a "granularity gap" in model capabilities: MedImageInsights achieved the strongest overall performance (97.52% weighted F1-Score on Binary Malignancy detection) but declined to 65.50% on fine-grained 40-class subtype classification. Conversely, MedSigLip (69.79%) and dermatology-specific models (Derm Foundation and MONET) excelled at fine-grained 40-class subtype discrimination while achieving lower overall performance than MedImageInsights on broader classification tasks. Our findings suggest that while general medical foundation models are highly effective for high-level screening, specialized modeling strategies are necessary for the granular distinctions required in diagnostic support systems.
Abstract:Skin lesion datasets provide essential information for understanding various skin conditions and developing effective diagnostic tools. They aid the artificial intelligence-based early detection of skin cancer, facilitate treatment planning, and contribute to medical education and research. Published large datasets have partially coverage the subclassifications of the skin lesions. This limitation highlights the need for more expansive and varied datasets to reduce false predictions and help improve the failure analysis for skin lesions. This study presents a diverse dataset comprising 12,345 dermatoscopic images with 38 subclasses of skin lesions collected in Turkiye which comprises different skin types in the transition zone between Europe and Asia. Each subgroup contains high-resolution photos and expert annotations, providing a strong and reliable basis for future research. The detailed analysis of each subgroup provided in this study facilitates targeted research endeavors and enhances the depth of understanding regarding the skin lesions. This dataset distinguishes itself through a diverse structure with 5 super classes, 15 main classes, 38 subclasses and its 12,345 high-resolution dermatoscopic images.




Abstract:Treatment for high-grade precancerous cervical lesions and early-stage cancers, mainly affecting women of reproductive age, often involves fertility-sparing treatment methods. Commonly used local treatments for cervical precancers have shown the risk of leaving a positive cancer margin and engendering subsequent complications according to the precision and depth of excision. An intra-operative device that allows the careful excision of the disease while conserving healthy cervical tissue would potentially enhance such treatment. In this study, we developed a polymer-based robotic fiber measuring 150 mm in length and 1.7 mm in diameter, fabricated using a highly scalable fiber drawing technique. This robotic fiber utilizes a hybrid actuation mechanism, combining electrothermal and tendon-driven actuation mechanisms, thus enabling a maximum motion range of 46 mm from its origin with a sub-100 {\mu}m motion precision. We also developed control algorithms for the actuation methods of this robotic fiber, including predefined path control and telemanipulation, enabling coarse positioning of the fiber tip to the target area followed by a precise scan. The combination of a surgical laser fiber with the robotic fiber allows for high-precision surgical ablation. Additionally, we conducted experiments using a cervical phantom that demonstrated the robotic fiber's ability to access and perform high-precision scans, highlighting its potential for cervical disease treatments and improvement of oncological outcomes.
Abstract:In recent years, the steerable needles have attracted significant interest in Minimally Invasive Surgery (MIS). Amongst these, the flexible Programmable-bevel tip needle (PBN) concept has successfully achieved an in-vivo demonstration to evaluate the feasibility of Convection Enhanced Delivery (CED) of chemotherapeutics within the ovine model, with a 2.5 mm PBN prototype. However, further size reduction is necessary for other diagnostic and therapeutic procedures involving deep-seated tissue structures. Since PBNs have a complex cross-section geometry, standard production methods, such as extrusion, fails as the outer diameter is reduced further. This paper presents our first attempt to demonstrate a new manufacturing method for the PBN that employs thermal drawing technology. Experimental characterisation tests were performed for the 2.5 mm PBN and a new 1.3 mm Thermally Drawn (TD) PBN prototype described here. The results show that thermal drawing presents a significant advantage in miniaturising complex needle structures. However, the steering behaviour is affected due to the choice of material in this first attempt, a limitation which will be addressed in future work.




Abstract:Objective: Probe-based confocal endomicroscopy is an emerging high-magnification optical imaging technique that provides in vivo and in situ cellular-level imaging for real-time assessment of tissue pathology. Endomicroscopy could potentially be used for intraoperative surgical guidance, but it is challenging to assess a surgical site using individual microscopic images due to the limited field-of-view and difficulties associated with manually manipulating the probe. Methods: In this paper, a novel robotic device for large-area endomicroscopy imaging is proposed, demonstrating a rapid, but highly accurate, scanning mechanism with image-based motion control which is able to generate histology-like endomicroscopy mosaics. The device also includes, for the first time in robotic-assisted endomicroscopy, the capability to ablate tissue without the need for an additional tool. Results: The device achieves pre-programmed trajectories with positioning accuracy of less than 30 um, while the image-based approach demonstrated that it can suppress random motion disturbances up to 1.25 mm/s. Mosaics are presented from a range of ex vivo human and animal tissues, over areas of more than 3 mm^2, scanned in approximate 10 seconds. Conclusion: This work demonstrates the potential of the proposed instrument to generate large-area, high-resolution microscopic images for intraoperative tissue identification and margin assessment. Significance: This approach presents an important alternative to current histology techniques, significantly reducing the tissue assessment time, while simultaneously providing the capability to mark and ablate suspicious areas intraoperatively.