Alibaba Group
Abstract:In complex scenes and varied conditions, effectively integrating spatial-temporal context is crucial for accurately identifying changes. However, current RS-CD methods lack a balanced consideration of performance and efficiency. CNNs lack global context, Transformers have quadratic computational complexity, and Mambas are restricted by CUDA acceleration. In this paper, we propose CDXFormer, with a core component that is a powerful XLSTM-based feature enhancement layer, integrating the advantages of linear computational complexity, global context perception, and strong interpret-ability. Specifically, we introduce a scale-specific Feature Enhancer layer, incorporating a Cross-Temporal Global Perceptron customized for semantic-accurate deep features, and a Cross-Temporal Spatial Refiner customized for detail-rich shallow features. Additionally, we propose a Cross-Scale Interactive Fusion module to progressively interact global change representations with spatial responses. Extensive experimental results demonstrate that CDXFormer achieves state-of-the-art performance across three benchmark datasets, offering a compelling balance between efficiency and accuracy. Code is available at https://github.com/xwmaxwma/rschange.
Abstract:As Large Language Models (LLMs) continue to advance in natural language processing (NLP), their ability to stably follow instructions in long-context inputs has become crucial for real-world applications. While existing benchmarks assess various LLM capabilities, they rarely focus on instruction-following in long-context scenarios or stability on different inputs. In response, we introduce the Long-context Instruction-Following Benchmark (LIFBench), a scalable dataset designed to evaluate LLMs' instruction-following capabilities and stability across long contexts. LIFBench comprises three long-context scenarios and eleven diverse tasks, supported by 2,766 instructions generated through an automated expansion method across three dimensions: length, expression, and variables. For evaluation, we propose LIFEval, a rubric-based assessment framework that provides precise, automated scoring of complex LLM responses without relying on LLM-assisted evaluations or human judgments. This approach facilitates a comprehensive analysis of model performance and stability across various perspectives. We conduct extensive experiments on 20 notable LLMs across six length intervals, analyzing their instruction-following capabilities and stability. Our work contributes LIFBench and LIFEval as robust tools for assessing LLM performance in complex, long-context settings, providing insights that can inform future LLM development.
Abstract:Code large language models (LLMs) have made significant progress in code debugging by directly generating the correct code based on the buggy code snippet. Programming benchmarks, typically consisting of buggy code snippet and their associated test cases, are used to assess the debugging capabilities of LLMs. However, many existing benchmarks primarily focus on Python and are often limited in terms of language diversity (e.g., DebugBench and DebugEval). To advance the field of multilingual debugging with LLMs, we propose the first massively multilingual debugging benchmark, which includes 3.6K test samples of 18 programming languages and covers the automated program repair (APR) task, the code review (CR) task, and the bug identification (BI) task. Further, we introduce the debugging instruction corpora MDEVAL-INSTRUCT by injecting bugs into the correct multilingual queries and solutions (xDebugGen). Further, a multilingual debugger xDebugCoder trained on MDEVAL-INSTRUCT as a strong baseline specifically to handle the bugs of a wide range of programming languages (e.g. "Missing Mut" in language Rust and "Misused Macro Definition" in language C). Our extensive experiments on MDEVAL reveal a notable performance gap between open-source models and closed-source LLMs (e.g., GPT and Claude series), highlighting huge room for improvement in multilingual code debugging scenarios.
Abstract:Retrieval-augmented generation methods often neglect the quality of content retrieved from external knowledge bases, resulting in irrelevant information or potential misinformation that negatively affects the generation results of large language models. In this paper, we propose an end-to-end model with adaptive filtering for retrieval-augmented generation (E2E-AFG), which integrates answer existence judgment and text generation into a single end-to-end framework. This enables the model to focus more effectively on relevant content while reducing the influence of irrelevant information and generating accurate answers. We evaluate E2E-AFG on six representative knowledge-intensive language datasets, and the results show that it consistently outperforms baseline models across all tasks, demonstrating the effectiveness and robustness of the proposed approach.
Abstract:Instruction-following capabilities in large language models (LLMs) have significantly progressed, enabling more complex user interactions through detailed prompts. However, retrieval systems have not matched these advances, most of them still relies on traditional lexical and semantic matching techniques that fail to fully capture user intent. Recent efforts have introduced instruction-aware retrieval models, but these primarily focus on intrinsic content relevance, which neglects the importance of customized preferences for broader document-level attributes. This study evaluates the instruction-following capabilities of various retrieval models beyond content relevance, including LLM-based dense retrieval and reranking models. We develop InfoSearch, a novel retrieval evaluation benchmark spanning six document-level attributes: Audience, Keyword, Format, Language, Length, and Source, and introduce novel metrics -- Strict Instruction Compliance Ratio (SICR) and Weighted Instruction Sensitivity Evaluation (WISE) to accurately assess the models' responsiveness to instructions. Our findings reveal that while reranking models generally surpass retrieval models in instruction following, they still face challenges in handling certain attributes. Moreover, although instruction fine-tuning and increased model size lead to better performance, most models fall short of achieving comprehensive instruction compliance as assessed by our benchmark.
Abstract:Communication enables the expansion of human visual perception beyond the limitations of time and distance, while computational imaging overcomes the constraints of depth and breadth. Although impressive achievements have been witnessed with the two types of technologies, the occlusive information flow between the two domains is a bottleneck hindering their ulterior progression. Herein, we propose a novel framework that integrates communication and computational imaging (ICCI) to break through the inherent isolation between communication and computational imaging for remote perception. By jointly considering the sensing and transmitting of remote visual information, the ICCI framework performs a full-link information transfer optimization, aiming to minimize information loss from the generation of the information source to the execution of the final vision tasks. We conduct numerical analysis and experiments to demonstrate the ICCI framework by integrating communication systems and snapshot compressive imaging systems. Compared with straightforward combination schemes, which sequentially execute sensing and transmitting, the ICCI scheme shows greater robustness against channel noise and impairments while achieving higher data compression. Moreover, an 80 km 27-band hyperspectral video perception with a rate of 30 fps is experimentally achieved. This new ICCI remote perception paradigm offers a highefficiency solution for various real-time computer vision tasks.
Abstract:Obtaining well-calibrated photometric redshift probability densities for galaxies without a spectroscopic measurement remains a challenge. Deep learning discriminative models, typically fed with multi-band galaxy images, can produce outputs that mimic probability densities and achieve state-of-the-art accuracy. However, such models may be affected by miscalibration that would result in discrepancies between the model outputs and the actual distributions of true redshifts. Our work develops a novel method called the Contrastive Learning and Adaptive KNN for Photometric Redshift (CLAP) that resolves this issue. It leverages supervised contrastive learning (SCL) and k-nearest neighbours (KNN) to construct and calibrate raw probability density estimates, and implements a refitting procedure to resume end-to-end discriminative models ready to produce final estimates for large-scale imaging data. The harmonic mean is adopted to combine an ensemble of estimates from multiple realisations for improving accuracy. Our experiments demonstrate that CLAP takes advantage of both deep learning and KNN, outperforming benchmark methods on the calibration of probability density estimates and retaining high accuracy and computational efficiency. With reference to CLAP, we point out that miscalibration is particularly sensitive to the method-induced excessive correlations among data instances in addition to the unaccounted-for epistemic uncertainties. Reducing the uncertainties may not guarantee the removal of miscalibration due to the presence of such excessive correlations, yet this is a problem for conventional deep learning methods rather than CLAP. These discussions underscore the robustness of CLAP for obtaining photometric redshift probability densities required by astrophysical and cosmological applications. This is the first paper in our series on CLAP.
Abstract:Uncertainty is a fundamental aspect of real-world scenarios, where perfect information is rarely available. Humans naturally develop complex internal models to navigate incomplete data and effectively respond to unforeseen or partially observed events. In machine learning, Focal Loss is commonly used to reduce misclassification rates by emphasizing hard-to-classify samples. However, it does not guarantee well-calibrated predicted probabilities and may result in models that are overconfident or underconfident. High calibration error indicates a misalignment between predicted probabilities and actual outcomes, affecting model reliability. This research introduces a novel loss function called Focal Calibration Loss (FCL), designed to improve probability calibration while retaining the advantages of Focal Loss in handling difficult samples. By minimizing the Euclidean norm through a strictly proper loss, FCL penalizes the instance-wise calibration error and constrains bounds. We provide theoretical validation for proposed method and apply it to calibrate CheXNet for potential deployment in web-based health-care systems. Extensive evaluations on various models and datasets demonstrate that our method achieves SOTA performance in both calibration and accuracy metrics.
Abstract:The Soft Actor-Critic (SAC) algorithm is known for its stability and high sample efficiency in deep reinforcement learning. However, the tanh transformation applied to sampled actions in SAC distorts the action distribution, hindering the selection of the most probable actions. This paper presents a novel action sampling method that directly identifies and selects the most probable actions within the transformed distribution, thereby addressing this issue. Extensive experiments on standard continuous control benchmarks demonstrate that the proposed method significantly enhances SAC's performance, resulting in faster convergence and higher cumulative rewards compared to the original algorithm.
Abstract:The vast adoption of Wi-Fi and/or Bluetooth capabilities in Internet of Things (IoT) devices, along with the rapid growth of deployed smart devices, has caused significant interference and congestion in the industrial, scientific, and medical (ISM) bands. Traditional Wi-Fi Medium Access Control (MAC) design faces significant challenges in managing increasingly complex wireless environments while ensuring network Quality of Service (QoS) performance. This paper explores the potential integration of advanced Artificial Intelligence (AI) methods into the design of Wi-Fi MAC protocols. We propose AI-MAC, an innovative approach that employs machine learning algorithms to dynamically adapt to changing network conditions, optimize channel access, mitigate interference, and ensure deterministic latency. By intelligently predicting and managing interference, AI-MAC aims to provide a robust solution for next generation of Wi-Fi networks, enabling seamless connectivity and enhanced QoS. Our experimental results demonstrate that AI-MAC significantly reduces both interference and latency, paving the way for more reliable and efficient wireless communications in the increasingly crowded ISM band.