NUS
Abstract:Paramagnetic rim lesions (PRLs) are imaging biomarker of the innate immune response in MS lesions. QSM-RimNet, a state-of-the-art tool for PRLs detection on QSM, can identify PRLs but requires precise QSM lesion mask and does not provide rim segmentation. Therefore, the aims of this study are to develop QSM-RimDS algorithm to detect PRLs using the readily available FLAIR lesion mask and to provide rim segmentation for microglial quantification. QSM-RimDS, a deep-learning based tool for joint PRL rim segmentation and PRL detection has been developed. QSM-RimDS has obtained state-of-the art performance in PRL detection and therefore has the potential to be used in clinical practice as a tool to assist human readers for the time-consuming PRL detection and segmentation task. QSM-RimDS is made publicly available [https://github.com/kennyha85/QSM_RimDS]
Abstract:Creating high-quality data for training robust language-instructed agents is a long-lasting challenge in embodied AI. In this paper, we introduce a Self-Refining Data Flywheel (SRDF) that generates high-quality and large-scale navigational instruction-trajectory pairs by iteratively refining the data pool through the collaboration between two models, the instruction generator and the navigator, without any human-in-the-loop annotation. Specifically, SRDF starts with using a base generator to create an initial data pool for training a base navigator, followed by applying the trained navigator to filter the data pool. This leads to higher-fidelity data to train a better generator, which can, in turn, produce higher-quality data for training the next-round navigator. Such a flywheel establishes a data self-refining process, yielding a continuously improved and highly effective dataset for large-scale language-guided navigation learning. Our experiments demonstrate that after several flywheel rounds, the navigator elevates the performance boundary from 70% to 78% SPL on the classic R2R test set, surpassing human performance (76%) for the first time. Meanwhile, this process results in a superior generator, evidenced by a SPICE increase from 23.5 to 26.2, better than all previous VLN instruction generation methods. Finally, we demonstrate the scalability of our method through increasing environment and instruction diversity, and the generalization ability of our pre-trained navigator across various downstream navigation tasks, surpassing state-of-the-art methods by a large margin in all cases.
Abstract:This paper introduces Bidirectional Guidance Informed Trees (BIGIT*),~a new asymptotically optimal sampling-based motion planning algorithm. Capitalizing on the strengths of \emph{meet-in-the-middle} property in bidirectional heuristic search with a new lazy strategy, and uniform-cost search, BIGIT* constructs an implicitly bidirectional preliminary motion tree on an implicit random geometric graph (RGG). This efficiently tightens the informed search region, serving as an admissible and accurate bidirectional guidance heuristic. This heuristic is subsequently utilized to guide a bidirectional heuristic search in finding a valid path on the given RGG. Experiments show that BIGIT* outperforms the existing informed sampling-based motion planners both in faster finding an initial solution and converging to the optimum on simulated abstract problems in $\mathbb{R}^{16}$. Practical drone flight path planning tasks across a campus also verify our results.
Abstract:We introduce InternVL 2.5, an advanced multimodal large language model (MLLM) series that builds upon InternVL 2.0, maintaining its core model architecture while introducing significant enhancements in training and testing strategies as well as data quality. In this work, we delve into the relationship between model scaling and performance, systematically exploring the performance trends in vision encoders, language models, dataset sizes, and test-time configurations. Through extensive evaluations on a wide range of benchmarks, including multi-discipline reasoning, document understanding, multi-image / video understanding, real-world comprehension, multimodal hallucination detection, visual grounding, multilingual capabilities, and pure language processing, InternVL 2.5 exhibits competitive performance, rivaling leading commercial models such as GPT-4o and Claude-3.5-Sonnet. Notably, our model is the first open-source MLLMs to surpass 70% on the MMMU benchmark, achieving a 3.7-point improvement through Chain-of-Thought (CoT) reasoning and showcasing strong potential for test-time scaling. We hope this model contributes to the open-source community by setting new standards for developing and applying multimodal AI systems. HuggingFace demo see https://huggingface.co/spaces/OpenGVLab/InternVL
Abstract:Recent DETR-based methods have advanced the development of Video Instance Segmentation (VIS) through transformers' efficiency and capability in modeling spatial and temporal information. Despite harvesting remarkable progress, existing works follow asynchronous designs, which model video sequences via either video-level queries only or adopting query-sensitive cascade structures, resulting in difficulties when handling complex and challenging video scenarios. In this work, we analyze the cause of this phenomenon and the limitations of the current solutions, and propose to conduct synchronized modeling via a new framework named SyncVIS. Specifically, SyncVIS explicitly introduces video-level query embeddings and designs two key modules to synchronize video-level query with frame-level query embeddings: a synchronized video-frame modeling paradigm and a synchronized embedding optimization strategy. The former attempts to promote the mutual learning of frame- and video-level embeddings with each other and the latter divides large video sequences into small clips for easier optimization. Extensive experimental evaluations are conducted on the challenging YouTube-VIS 2019 & 2021 & 2022, and OVIS benchmarks and SyncVIS achieves state-of-the-art results, which demonstrates the effectiveness and generality of the proposed approach. The code is available at https://github.com/rkzheng99/SyncVIS.
Abstract:Recently Transformer-based models have advanced point cloud understanding by leveraging self-attention mechanisms, however, these methods often overlook latent information in less prominent regions, leading to increased sensitivity to perturbations and limited global comprehension. To solve this issue, we introduce PointACL, an attention-driven contrastive learning framework designed to address these limitations. Our method employs an attention-driven dynamic masking strategy that guides the model to focus on under-attended regions, enhancing the understanding of global structures within the point cloud. Then we combine the original pre-training loss with a contrastive learning loss, improving feature discrimination and generalization. Extensive experiments validate the effectiveness of PointACL, as it achieves state-of-the-art performance across a variety of 3D understanding tasks, including object classification, part segmentation, and few-shot learning. Specifically, when integrated with different Transformer backbones like Point-MAE and PointGPT, PointACL demonstrates improved performance on datasets such as ScanObjectNN, ModelNet40, and ShapeNetPart. This highlights its superior capability in capturing both global and local features, as well as its enhanced robustness against perturbations and incomplete data.
Abstract:Traffic signs play a key role in assisting autonomous driving systems (ADS) by enabling the assessment of vehicle behavior in compliance with traffic regulations and providing navigation instructions. However, current works are limited to basic sign understanding without considering the egocentric vehicle's spatial position, which fails to support further regulation assessment and direction navigation. Following the above issues, we introduce a new task: traffic sign interpretation from the vehicle's first-person view, referred to as TSI-FPV. Meanwhile, we develop a traffic guidance assistant (TGA) scenario application to re-explore the role of traffic signs in ADS as a complement to popular autonomous technologies (such as obstacle perception). Notably, TGA is not a replacement for electronic map navigation; rather, TGA can be an automatic tool for updating it and complementing it in situations such as offline conditions or temporary sign adjustments. Lastly, a spatial and semantic logic-aware stepwise reasoning pipeline (SignEye) is constructed to achieve the TSI-FPV and TGA, and an application-specific dataset (Traffic-CN) is built. Experiments show that TSI-FPV and TGA are achievable via our SignEye trained on Traffic-CN. The results also demonstrate that the TGA can provide complementary information to ADS beyond existing popular autonomous technologies.
Abstract:Having a better understanding of how locational marginal prices (LMPs) change helps in price forecasting and market strategy making. This paper investigates the fundamental distribution of the congestion part of LMPs in high-dimensional Euclidean space using an unsupervised approach. LMP models based on the lossless and lossy DC optimal power flow (DC-OPF) are analyzed to show the overlapping subspace property of the LMP data. The congestion part of LMPs is spanned by certain row vectors of the power transfer distribution factor (PTDF) matrix, and the subspace attributes of an LMP vector uniquely are found to reflect the instantaneous congestion status of all the transmission lines. The proposed method searches for the basis vectors that span the subspaces of congestion LMP data in hierarchical ways. In the bottom-up search, the data belonging to 1-dimensional subspaces are detected, and other data are projected on the orthogonal subspaces. This procedure is repeated until all the basis vectors are found or the basis gap appears. Top-down searching is used to address the basis gap by hyperplane detection with outliers. Once all the basis vectors are detected, the congestion status can be identified. Numerical experiments based on the IEEE 30-bus system, IEEE 118-bus system, Illinois 200-bus system, and Southwest Power Pool are conducted to show the performance of the proposed method.
Abstract:The two-way flow of information and energy is an important feature of the Energy Internet. Data analytics is a powerful tool in the information flow that aims to solve practical problems using data mining techniques. As the problem of electricity thefts via tampering with smart meters continues to increase, the abnormal behaviors of thefts become more diversified and more difficult to detect. Thus, a data analytics method for detecting various types of electricity thefts is required. However, the existing methods either require a labeled dataset or additional system information which is difficult to obtain in reality or have poor detection accuracy. In this paper, we combine two novel data mining techniques to solve the problem. One technique is the Maximum Information Coefficient (MIC), which can find the correlations between the non-technical loss (NTL) and a certain electricity behavior of the consumer. MIC can be used to precisely detect thefts that appear normal in shapes. The other technique is the clustering technique by fast search and find of density peaks (CFSFDP). CFSFDP finds the abnormal users among thousands of load profiles, making it quite suitable for detecting electricity thefts with arbitrary shapes. Next, a framework for combining the advantages of the two techniques is proposed. Numerical experiments on the Irish smart meter dataset are conducted to show the good performance of the combined method.
Abstract:The growing penetration of electric vehicles (EVs) significantly changes typical load curves in smart grids. With the development of fast charging technology, the volatility of EV charging demand is increasing, which requires additional flexibility for real-time power balance. The forecasting of EV charging demand involves probabilistic modeling of high dimensional time series dynamics across diverse electric vehicle charging stations (EVCSs). This paper studies the forecasting problem of multiple EVCS in a hierarchical probabilistic manner. For each charging station, a deep learning model based on a partial input convex neural network (PICNN) is trained to predict the day-ahead charging demand's conditional distribution, preventing the common quantile crossing problem in traditional quantile regression models. Then, differentiable convex optimization layers (DCLs) are used to reconcile the scenarios sampled from the distributions to yield coherent scenarios that satisfy the hierarchical constraint. It learns a better weight matrix for adjusting the forecasting results of different targets in a machine-learning approach compared to traditional optimization-based hierarchical reconciling methods. Numerical experiments based on real-world EV charging data are conducted to demonstrate the efficacy of the proposed method.