Abstract:Previous studies have demonstrated the strong performance of Graph Neural Networks (GNNs) in node classification. However, most existing GNNs adopt a node-centric perspective and rely on global message passing, leading to high computational and memory costs that hinder scalability. To mitigate these challenges, subgraph-based methods have been introduced, leveraging local subgraphs as approximations of full computational trees. While this approach improves efficiency, it often suffers from performance degradation due to the loss of global contextual information, limiting its effectiveness compared to global GNNs. To address this trade-off between scalability and classification accuracy, we reformulate the node classification task as a subgraph classification problem and propose SubGND (Subgraph GNN for NoDe). This framework introduces a differentiated zero-padding strategy and an Ego-Alter subgraph representation method to resolve label conflicts while incorporating an Adaptive Feature Scaling Mechanism to dynamically adjust feature contributions based on dataset-specific dependencies. Experimental results on six benchmark datasets demonstrate that SubGND achieves performance comparable to or surpassing global message-passing GNNs, particularly in heterophilic settings, highlighting its effectiveness and scalability as a promising solution for node classification.
Abstract:In this technical report, we tackle the challenges of training large-scale Mixture of Experts (MoE) models, focusing on overcoming cost inefficiency and resource limitations prevalent in such systems. To address these issues, we present two differently sized MoE large language models (LLMs), namely Ling-Lite and Ling-Plus (referred to as "Bailing" in Chinese, spelled B\v{a}il\'ing in Pinyin). Ling-Lite contains 16.8 billion parameters with 2.75 billion activated parameters, while Ling-Plus boasts 290 billion parameters with 28.8 billion activated parameters. Both models exhibit comparable performance to leading industry benchmarks. This report offers actionable insights to improve the efficiency and accessibility of AI development in resource-constrained settings, promoting more scalable and sustainable technologies. Specifically, to reduce training costs for large-scale MoE models, we propose innovative methods for (1) optimization of model architecture and training processes, (2) refinement of training anomaly handling, and (3) enhancement of model evaluation efficiency. Additionally, leveraging high-quality data generated from knowledge graphs, our models demonstrate superior capabilities in tool use compared to other models. Ultimately, our experimental findings demonstrate that a 300B MoE LLM can be effectively trained on lower-performance devices while achieving comparable performance to models of a similar scale, including dense and MoE models. Compared to high-performance devices, utilizing a lower-specification hardware system during the pre-training phase demonstrates significant cost savings, reducing computing costs by approximately 20%. The models can be accessed at https://huggingface.co/inclusionAI.
Abstract:Recent advancements in 3D multi-object tracking (3D MOT) have predominantly relied on tracking-by-detection pipelines. However, these approaches often neglect potential enhancements in 3D detection processes, leading to high false positives (FP), missed detections (FN), and identity switches (IDS), particularly in challenging scenarios such as crowded scenes, small-object configurations, and adverse weather conditions. Furthermore, limitations in data preprocessing, association mechanisms, motion modeling, and life-cycle management hinder overall tracking robustness. To address these issues, we present Easy-Poly, a real-time, filter-based 3D MOT framework for multiple object categories. Our contributions include: (1) An Augmented Proposal Generator utilizing multi-modal data augmentation and refined SpConv operations, significantly improving mAP and NDS on nuScenes; (2) A Dynamic Track-Oriented (DTO) data association algorithm that effectively manages uncertainties and occlusions through optimal assignment and multiple hypothesis handling; (3) A Dynamic Motion Modeling (DMM) incorporating a confidence-weighted Kalman filter and adaptive noise covariances, enhancing MOTA and AMOTA in challenging conditions; and (4) An extended life-cycle management system with adjustive thresholds to reduce ID switches and false terminations. Experimental results show that Easy-Poly outperforms state-of-the-art methods such as Poly-MOT and Fast-Poly, achieving notable gains in mAP (e.g., from 63.30% to 64.96% with LargeKernel3D) and AMOTA (e.g., from 73.1% to 74.5%), while also running in real-time. These findings highlight Easy-Poly's adaptability and robustness in diverse scenarios, making it a compelling choice for autonomous driving and related 3D MOT applications. The source code of this paper will be published upon acceptance.
Abstract:While recent low-cost radar-camera approaches have shown promising results in multi-modal 3D object detection, both sensors face challenges from environmental and intrinsic disturbances. Poor lighting or adverse weather conditions degrade camera performance, while radar suffers from noise and positional ambiguity. Achieving robust radar-camera 3D object detection requires consistent performance across varying conditions, a topic that has not yet been fully explored. In this work, we first conduct a systematic analysis of robustness in radar-camera detection on five kinds of noises and propose RobuRCDet, a robust object detection model in BEV. Specifically, we design a 3D Gaussian Expansion (3DGE) module to mitigate inaccuracies in radar points, including position, Radar Cross-Section (RCS), and velocity. The 3DGE uses RCS and velocity priors to generate a deformable kernel map and variance for kernel size adjustment and value distribution. Additionally, we introduce a weather-adaptive fusion module, which adaptively fuses radar and camera features based on camera signal confidence. Extensive experiments on the popular benchmark, nuScenes, show that our model achieves competitive results in regular and noisy conditions.
Abstract:Adaptation to unpredictable damages is crucial for autonomous legged robots, yet existing methods based on multi-policy or meta-learning frameworks face challenges like limited generalization and complex maintenance. To address this issue, we first analyze and summarize eight types of damage scenarios, including sensor failures and joint malfunctions. Then, we propose a novel, model-free, two-stage training framework, Unified Malfunction Controller (UMC), incorporating a masking mechanism to enhance damage resilience. Specifically, the model is initially trained with normal environments to ensure robust performance under standard conditions. In the second stage, we use masks to prevent the legged robot from relying on malfunctioning limbs, enabling adaptive gait and movement adjustments upon malfunction. Experimental results demonstrate that our approach improves the task completion capability by an average of 36% for the transformer and 39% for the MLP across three locomotion tasks. The source code and trained models will be made available to the public.
Abstract:No-Reference Image Quality Assessment for distorted images has always been a challenging problem due to image content variance and distortion diversity. Previous IQA models mostly encode explicit single-quality features of synthetic images to obtain quality-aware representations for quality score prediction. However, performance decreases when facing real-world distortion and restored images from restoration models. The reason is that they do not consider the degradation factors of the low-quality images adequately. To address this issue, we first introduce the DRI method to obtain degradation vectors and quality vectors of images, which separately model the degradation and quality information of low-quality images. After that, we add the restoration network to provide the MOS score predictor with degradation information. Then, we design the Representation-based Semantic Loss (RS Loss) to assist in enhancing effective interaction between representations. Extensive experimental results demonstrate that the proposed method performs favorably against existing state-of-the-art models on both synthetic and real-world datasets.
Abstract:All-in-one image restoration aims to handle multiple degradation types using one model. This paper proposes a simple pipeline for all-in-one blind image restoration to Restore Anything with Masks (RAM). We focus on the image content by utilizing Mask Image Modeling to extract intrinsic image information rather than distinguishing degradation types like other methods. Our pipeline consists of two stages: masked image pre-training and fine-tuning with mask attribute conductance. We design a straightforward masking pre-training approach specifically tailored for all-in-one image restoration. This approach enhances networks to prioritize the extraction of image content priors from various degradations, resulting in a more balanced performance across different restoration tasks and achieving stronger overall results. To bridge the gap of input integrity while preserving learned image priors as much as possible, we selectively fine-tuned a small portion of the layers. Specifically, the importance of each layer is ranked by the proposed Mask Attribute Conductance (MAC), and the layers with higher contributions are selected for finetuning. Extensive experiments demonstrate that our method achieves state-of-the-art performance. Our code and model will be released at \href{https://github.com/Dragonisss/RAM}{https://github.com/Dragonisss/RAM}.
Abstract:Unsupervised restoration approaches based on generative adversarial networks (GANs) offer a promising solution without requiring paired datasets. Yet, these GAN-based approaches struggle to surpass the performance of conventional unsupervised GAN-based frameworks without significantly modifying model structures or increasing the computational complexity. To address these issues, we propose a self-collaboration (SC) strategy for existing restoration models. This strategy utilizes information from the previous stage as feedback to guide subsequent stages, achieving significant performance improvement without increasing the framework's inference complexity. The SC strategy comprises a prompt learning (PL) module and a restorer ($Res$). It iteratively replaces the previous less powerful fixed restorer $\overline{Res}$ in the PL module with a more powerful $Res$. The enhanced PL module generates better pseudo-degraded/clean image pairs, leading to a more powerful $Res$ for the next iteration. Our SC can significantly improve the $Res$'s performance by over 1.5 dB without adding extra parameters or computational complexity during inference. Meanwhile, existing self-ensemble (SE) and our SC strategies enhance the performance of pre-trained restorers from different perspectives. As SE increases computational complexity during inference, we propose a re-boosting module to the SC (Reb-SC) to improve the SC strategy further by incorporating SE into SC without increasing inference time. This approach further enhances the restorer's performance by approximately 0.3 dB. Extensive experimental results on restoration tasks demonstrate that the proposed model performs favorably against existing state-of-the-art unsupervised restoration methods. Source code and trained models are publicly available at: \url{https://github.com/linxin0/RSCP2GAN}.
Abstract:Educational resource understanding is vital to online learning platforms, which have demonstrated growing applications recently. However, researchers and developers always struggle with using existing general natural language toolkits or domain-specific models. The issue raises a need to develop an effective and easy-to-use one that benefits AI education-related research and applications. To bridge this gap, we present a unified, modularized, and extensive library, EduNLP, focusing on educational resource understanding. In the library, we decouple the whole workflow to four key modules with consistent interfaces including data configuration, processing, model implementation, and model evaluation. We also provide a configurable pipeline to unify the data usage and model usage in standard ways, where users can customize their own needs. For the current version, we primarily provide 10 typical models from four categories, and 5 common downstream-evaluation tasks in the education domain on 8 subjects for users' usage. The project is released at: https://github.com/bigdata-ustc/EduNLP.
Abstract:Math world problems correction(MWPC) is a novel task dedicated to rectifying reasoning errors in the process of solving mathematical problems. In this paper, leveraging the advancements in large language models (LLMs), we address two key objectives:(1) Distinguishing between mathematical reasoning and error correction; (2) Exploring strategies to enhance the error correction capabilities of LLMs in mathematics to solve MWPC task. We noticed that, in real-time education,assisting students in recognizing their mistakes is more crucial than simply providing correct answers. However, current research tends to prioritize obtaining accurate solutions to math problems rather than correcting potentially incorrect ones. Therefore, we modify the research paradigm, demonstrating that improving mathematical reasoning abilities does not equate to mastery in error correction. Meanwhile, we propose a novel method called diagnostic-oriented promping(DOP) aimed at facilitating LLMs to excel in error correction. In experiments, DOP has shown outstanding performance, highlighting its significant impact. We argue that in mathematical education, the demand for outstanding correctors surpasses that for proficient reasoners. Codes and data are available on https://github.com/ChenhaoEcnuCS/Reason-Correct.