Abstract:In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. Gemini 1.5 Pro achieves near-perfect recall on long-context retrieval tasks across modalities, improves the state-of-the-art in long-document QA, long-video QA and long-context ASR, and matches or surpasses Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5 Pro's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 2.1 (200k) and GPT-4 Turbo (128k). Finally, we highlight surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
Abstract:Handheld ultrasound devices face usage limitations due to user inexperience and cannot benefit from supervised deep learning without extensive expert annotations. Moreover, the models trained on standard ultrasound device data are constrained by training data distribution and perform poorly when directly applied to handheld device data. In this study, we propose the Training-free Image Style Alignment (TISA) framework to align the style of handheld device data to those of standard devices. The proposed TISA can directly infer handheld device images without extra training and is suited for clinical applications. We show that TISA performs better and more stably in medical detection and segmentation tasks for handheld device data. We further validate TISA as the clinical model for automatic measurements of spinal curvature and carotid intima-media thickness. The automatic measurements agree well with manual measurements made by human experts and the measurement errors remain within clinically acceptable ranges. We demonstrate the potential for TISA to facilitate automatic diagnosis on handheld ultrasound devices and expedite their eventual widespread use.
Abstract:Objective: The objective of this study is to develop a deep-learning based detection and diagnosis technique for carotid atherosclerosis using a portable freehand 3D ultrasound (US) imaging system. Methods: A total of 127 3D carotid artery datasets were acquired using a portable 3D US imaging system. A U-Net segmentation network was firstly applied to extract the carotid artery on 2D transverse frame, then a novel 3D reconstruction algorithm using fast dot projection (FDP) method with position regularization was proposed to reconstruct the carotid artery volume. Furthermore, a convolutional neural network was used to classify the healthy case and diseased case qualitatively. 3D volume analysis including longitudinal reprojection algorithm and stenosis grade measurement algorithm was developed to obtain the clinical metrics quantitatively. Results: The proposed system achieved sensitivity of 0.714, specificity of 0.851 and accuracy of 0.803 respectively in diagnosis of carotid atherosclerosis. The automatically measured stenosis grade illustrated good correlation (r=0.762) with the experienced expert measurement. Conclusion: the developed technique based on 3D US imaging can be applied to the automatic diagnosis of carotid atherosclerosis. Significance: The proposed deep-learning based technique was specially designed for a portable 3D freehand US system, which can provide carotid atherosclerosis examination more conveniently and decrease the dependence on clinician's experience.