Abstract:In this work, we provide data stream algorithms that compute optimal splits in decision tree learning. In particular, given a data stream of observations $x_i$ and their labels $y_i$, the goal is to find the optimal split point $j$ that divides the data into two sets such that the mean squared error (for regression) or misclassification rate (for classification) is minimized. We provide various fast streaming algorithms that use sublinear space and a small number of passes for these problems. These algorithms can also be extended to the massively parallel computation model. Our work, while not directly comparable, complements the seminal work of Domingos and Hulten (KDD 2000).
Abstract:In this technical report, we present VinaLLaMA, an open-weight, state-of-the-art (SOTA) Large Language Model for the Vietnamese language, built upon LLaMA-2 with an additional 800 billion trained tokens. VinaLLaMA not only demonstrates fluency in Vietnamese but also exhibits a profound understanding of Vietnamese culture, making it a truly indigenous model. VinaLLaMA-7B-chat, trained on 1 million high-quality synthetic samples, achieves SOTA results on key benchmarks, including VLSP, VMLU, and Vicuna Benchmark Vietnamese, marking a significant advancement in the Vietnamese AI landscape and offering a versatile resource for various applications.
Abstract:Cardiovascular disease remains a significant problem in modern society. Among non-invasive techniques, the electrocardiogram (ECG) is one of the most reliable methods for detecting abnormalities in cardiac activities. However, ECG interpretation requires expert knowledge and it is time-consuming. Developing a novel method to detect the disease early could prevent death and complication. The paper presents novel various approaches for classifying cardiac diseases from ECG recordings. The first approach suggests the Poincare representation of ECG signal and deep-learning-based image classifiers (ResNet50 and DenseNet121 were learned over Poincare diagrams), which showed decent performance in predicting AF (atrial fibrillation) but not other types of arrhythmia. XGBoost, a gradient-boosting model, showed an acceptable performance in long-term data but had a long inference time due to highly-consuming calculation within the pre-processing phase. Finally, the 1D convolutional model, specifically the 1D ResNet, showed the best results in both studied CinC 2017 and CinC 2020 datasets, reaching the F1 score of 85% and 71%, respectively, and that was superior to the first-ranking solution of each challenge. The paper also investigated efficiency metrics such as power consumption and equivalent CO2 emissions, with one-dimensional models like 1D CNN and 1D ResNet being the most energy efficient. Model interpretation analysis showed that the DenseNet detected AF using heart rate variability while the 1DResNet assessed AF pattern in raw ECG signals.
Abstract:Combining the unmatched soft-tissue imaging capabilities of magnetic resonance imaging (MRI) with high precision robotics has the potential to improve the accuracy, precision, and safety of a wide range of image-guided medical procedures. However, the goal of highly functional MRI-compatible robotic systems has not yet been realized because conventional electromagnetic servomotors used by medical robots can become dangerous projectiles near the strong magnetic field of an MRI scanner. Here we report a novel electromagnetic servomotor design that is constructed from non-magnetic components and can operate within the patient area of clinical scanners. We show that this design enables high-torque and precisely controlled rotary actuation during imaging. Using this servomotor design, an MRI-compatible robot was constructed and tested. The robot demonstrated that the linear forces required to manipulate large diameter surgical instruments in tissues could be achieved during simultaneous imaging with MRI. This work presents the first fully functional electromagnetic servomotor that can be safely operated (while imaging) in the patient area of a 3 Tesla clinical MRI scanner.