Alibaba Group
Abstract:The integration of sensing and communication (ISAC) is a cornerstone of 6G, enabling simultaneous environmental awareness and communication. This paper explores radio SLAM (simultaneous localization and mapping) as a key ISAC approach, using radio signals for mapping and localization. We analyze radio SLAM across different frequency bands, discussing trade-offs in coverage, resolution, and hardware requirements. We also highlight opportunities for integration with sensing, positioning, and cooperative networks. The findings pave the way for standardized solutions in 6G applications such as autonomous systems and industrial robotics.
Abstract:Hyperspectral unmixing (HU) targets to decompose each mixed pixel in remote sensing images into a set of endmembers and their corresponding abundances. Despite significant progress in this field using deep learning, most methods fail to simultaneously characterize global dependencies and local consistency, making it difficult to preserve both long-range interactions and boundary details. This letter proposes a novel transformer-guided content-adaptive graph unmixing framework (T-CAGU), which overcomes these challenges by employing a transformer to capture global dependencies and introducing a content-adaptive graph neural network to enhance local relationships. Unlike previous work, T-CAGU integrates multiple propagation orders to dynamically learn the graph structure, ensuring robustness against noise. Furthermore, T-CAGU leverages a graph residual mechanism to preserve global information and stabilize training. Experimental results demonstrate its superiority over the state-of-the-art methods. Our code is available at https://github.com/xianchaoxiu/T-CAGU.
Abstract:Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution. While many works have focused on improving the efficiency of legal professionals, the research on helping non-professionals (e.g., plaintiffs) remains unexplored. This paper explores the problem of legal claim generation based on the given case's facts. First, we construct ClaimGen-CN, the first dataset for Chinese legal claim generation task, from various real-world legal disputes. Additionally, we design an evaluation metric tailored for assessing the generated claims, which encompasses two essential dimensions: factuality and clarity. Building on this, we conduct a comprehensive zero-shot evaluation of state-of-the-art general and legal-domain large language models. Our findings highlight the limitations of the current models in factual precision and expressive clarity, pointing to the need for more targeted development in this domain. To encourage further exploration of this important task, we will make the dataset publicly available.
Abstract:The Unit Commitment (UC) problem is a classic challenge in the optimal scheduling of power systems. Years of research and practice have shown that formulating reasonable unit commitment plans can significantly improve the economic efficiency of power systems' operations. In recent years, with the introduction of technologies such as machine learning and the Lagrangian relaxation method, the solution methods for the UC problem have become increasingly diversified, but still face challenges in terms of accuracy and robustness. This paper proposes a Function Space Search (FunSearch) method based on large language models. This method combines pre-trained large language models and evaluators to creatively generate solutions through the program search and evolution process while ensuring their rationality. In simulation experiments, a case of unit commitment with \(10\) units is used mainly. Compared to the genetic algorithm, the results show that FunSearch performs better in terms of sampling time, evaluation time, and total operating cost of the system, demonstrating its great potential as an effective tool for solving the UC problem.
Abstract:Recent advancements in AI agents have demonstrated their growing potential to drive and support scientific discovery. In this work, we introduce MLR-Bench, a comprehensive benchmark for evaluating AI agents on open-ended machine learning research. MLR-Bench includes three key components: (1) 201 research tasks sourced from NeurIPS, ICLR, and ICML workshops covering diverse ML topics; (2) MLR-Judge, an automated evaluation framework combining LLM-based reviewers with carefully designed review rubrics to assess research quality; and (3) MLR-Agent, a modular agent scaffold capable of completing research tasks through four stages: idea generation, proposal formulation, experimentation, and paper writing. Our framework supports both stepwise assessment across these distinct research stages, and end-to-end evaluation of the final research paper. We then use MLR-Bench to evaluate six frontier LLMs and an advanced coding agent, finding that while LLMs are effective at generating coherent ideas and well-structured papers, current coding agents frequently (e.g., in 80% of the cases) produce fabricated or invalidated experimental results--posing a major barrier to scientific reliability. We validate MLR-Judge through human evaluation, showing high agreement with expert reviewers, supporting its potential as a scalable tool for research evaluation. We open-source MLR-Bench to help the community benchmark, diagnose, and improve AI research agents toward trustworthy and transparent scientific discovery.
Abstract:Video Large Language Models (Video LLMs) have achieved remarkable results in video understanding tasks. However, they often suffer from heavy computational overhead due to the large number of visual tokens generated from multiple video frames. Existing visual token compression methods often rely on attention scores from language models as guidance. However, these scores exhibit inherent biases: global bias reflects a tendency to focus on the two ends of the visual token sequence, while local bias leads to an over-concentration on the same spatial positions across different frames. To address the issue of attention bias, we propose $\textbf{A}$ttention-$\textbf{D}$ebi$\textbf{a}$sed $\textbf{T}$oken $\textbf{P}$runing for Video Large Language Models ($\textbf{AdaTP}$), a novel token pruning pipeline for Video LLMs. AdaTP integrates two dedicated debiasing modules into the pipeline, targeting global attention bias and local attention bias, respectively. Without the need for additional training, our method significantly reduces the computational overhead of Video LLMs while retaining the performance of vanilla models. Extensive evaluation shows that AdaTP achieves state-of-the-art performance in various commonly used video understanding benchmarks. In particular, on LLaVA-OneVision-7B, AdaTP maintains performance without degradation while using only up to $27.3\%$ FLOPs compared to the vanilla model. Our code will be released soon.
Abstract:Large Language Models (LLMs) have achieved impressive performance across a range of natural language processing tasks. However, recent advances demonstrate that further gains particularly in complex reasoning tasks require more than merely scaling up model sizes or training data. One promising direction is to enable models to think during the reasoning process. Recently, Quiet STaR significantly improves reasoning by generating token-level thought traces, but incurs substantial inference overhead. In this work, we propose Fast Quiet STaR, a more efficient reasoning framework that preserves the benefits of token-level reasoning while reducing computational cost. Our method introduces a curriculum learning based training strategy that gradually reduces the number of thought tokens, enabling the model to internalize more abstract and concise reasoning processes. We further extend this approach to the standard Next Token Prediction (NTP) setting through reinforcement learning-based fine-tuning, resulting in Fast Quiet-STaR NTP, which eliminates the need for explicit thought token generation during inference. Experiments on four benchmark datasets with Mistral 7B and Qwen2.5 7B demonstrate that Fast Quiet-STaR consistently outperforms Quiet-STaR in terms of average accuracy under the same inference time budget. Notably, Fast Quiet-STaR NTP achieves an average accuracy improvement of 9\% on Mistral 7B and 5.7\% on Qwen2.5 7B, while maintaining the same inference latency. Our code will be available at https://github.com/huangwei200012/Fast-Quiet-STaR.
Abstract:High-accuracy localization is a key enabler for integrated sensing and communication (ISAC), playing an essential role in various applications such as autonomous driving. Antenna arrays and reconfigurable intelligent surface (RIS) are incorporated into these systems to achieve high angular resolution, assisting in the localization process. However, array and RIS beam patterns in practice often deviate from the idealized models used for algorithm design, leading to significant degradation in positioning accuracy. This mismatch highlights the need for beam calibration to bridge the gap between theoretical models and real-world hardware behavior. In this paper, we present and analyze three beam models considering several key non-idealities such as mutual coupling, non-ideal codebook, and measurement uncertainties. Based on the models, we then develop calibration algorithms to estimate the model parameters that can be used for future localization tasks. This work evaluates the effectiveness of the beam models and the calibration algorithms using both theoretical bounds and real-world beam pattern data from an RIS prototype. The simulation results show that the model incorporating combined impacts can accurately reconstruct measured beam patterns. This highlights the necessity of realistic beam modeling and calibration to achieve high-accuracy localization.
Abstract:Reconfigurable intelligent surfaces (RISs) have the potential to significantly enhance the performance of integrated sensing and communication (ISAC) systems, particularly in line-of-sight (LoS) blockage scenarios. However, as larger RISs are integrated into ISAC systems, mutual coupling (MC) effects between RIS elements become more pronounced, leading to a substantial degradation in performance, especially for localization applications. In this paper, we first conduct a misspecified and standard Cram\'er-Rao bound analysis to quantify the impact of MC on localization performance, demonstrating severe degradations in accuracy, especially when MC is ignored. Building on this, we propose a novel joint user equipment localization and RIS MC parameter estimation (JLMC) method in near-field wireless systems. Our two-stage MC-aware approach outperforms classical methods that neglect MC, significantly improving localization accuracy and overall system performance. Simulation results validate the effectiveness and advantages of the proposed method in realistic scenarios.
Abstract:Integrated sensing and communication (ISAC) enables radio systems to simultaneously sense and communicate with their environment. This paper, developed within the Hexa-X-II project funded by the European Union, presents a comprehensive cross-layer vision for ISAC in 6G networks, integrating insights from physical-layer design, hardware architectures, AI-driven intelligence, and protocol-level innovations. We begin by revisiting the foundational principles of ISAC, highlighting synergies and trade-offs between sensing and communication across different integration levels. Enabling technologies, such as multiband operation, massive and distributed MIMO, non-terrestrial networks, reconfigurable intelligent surfaces, and machine learning, are analyzed in conjunction with hardware considerations including waveform design, synchronization, and full-duplex operation. To bridge implementation and system-level evaluation, we introduce a quantitative cross-layer framework linking design parameters to key performance and value indicators. By synthesizing perspectives from both academia and industry, this paper outlines how deeply integrated ISAC can transform 6G into a programmable and context-aware platform supporting applications from reliable wireless access to autonomous mobility and digital twinning.