Department of Information Technology, Uppsala University, Uppsala, Sweden
Abstract:Graph Neural Networks (GNNs) have gained traction in Graph-based Machine Learning as a Service (GMLaaS) platforms, yet they remain vulnerable to graph-based model extraction attacks (MEAs), where adversaries reconstruct surrogate models by querying the victim model. Existing defense mechanisms, such as watermarking and fingerprinting, suffer from poor real-time performance, susceptibility to evasion, or reliance on post-attack verification, making them inadequate for handling the dynamic characteristics of graph-based MEA variants. To address these limitations, we propose ATOM, a novel real-time MEA detection framework tailored for GNNs. ATOM integrates sequential modeling and reinforcement learning to dynamically detect evolving attack patterns, while leveraging $k$-core embedding to capture the structural properties, enhancing detection precision. Furthermore, we provide theoretical analysis to characterize query behaviors and optimize detection strategies. Extensive experiments on multiple real-world datasets demonstrate that ATOM outperforms existing approaches in detection performance, maintaining stable across different time steps, thereby offering a more effective defense mechanism for GMLaaS environments.
Abstract:Modern machine learning tasks often involve massive datasets and models, necessitating distributed optimization algorithms with reduced communication overhead. Communication compression, where clients transmit compressed updates to a central server, has emerged as a key technique to mitigate communication bottlenecks. However, the theoretical understanding of stochastic distributed optimization with contractive compression remains limited, particularly in conjunction with Nesterov acceleration -- a cornerstone for achieving faster convergence in optimization. In this paper, we propose a novel algorithm, ADEF (Accelerated Distributed Error Feedback), which integrates Nesterov acceleration, contractive compression, error feedback, and gradient difference compression. We prove that ADEF achieves the first accelerated convergence rate for stochastic distributed optimization with contractive compression in the general convex regime. Numerical experiments validate our theoretical findings and demonstrate the practical efficacy of ADEF in reducing communication costs while maintaining fast convergence.
Abstract:Computed tomography (CT) is a key imaging modality for diagnosis, yet its clinical utility is marred by high radiation exposure and long turnaround times, restricting its use for larger-scale screening. Although chest radiography (CXR) is more accessible and safer, existing CXR foundation models focus primarily on detecting diseases that are readily visible on the CXR. Recently, works have explored training disease classification models on simulated CXRs, but they remain limited to recognizing a single disease type from CT. CT foundation models have also emerged with significantly improved detection of pathologies in CT. However, the generalized application of CT-derived labels on CXR has remained illusive. In this study, we propose X2CT-CLIP, a tri-modal knowledge transfer learning framework that bridges the modality gap between CT and CXR while reducing the computational burden of model training. Our approach is the first work to enable multi-abnormality classification in CT, using CXR, by transferring knowledge from 3D CT volumes and associated radiology reports to a CXR encoder via a carefully designed tri-modal alignment mechanism in latent space. Extensive evaluations on three multi-label CT datasets demonstrate that our method outperforms state-of-the-art baselines in cross-modal retrieval, few-shot adaptation, and external validation. These results highlight the potential of CXR, enriched with knowledge derived from CT, as a viable efficient alternative for disease detection in resource-limited settings.
Abstract:Next-generation mobile networks are set to utilize integrated sensing and communication (ISAC) as a critical technology, providing significant support for sectors like the industrial Internet of Things (IIoT), extended reality (XR), and smart home applications. A key challenge in ISAC implementation is the extraction of sensing parameters from radio signals, a task that conventional methods struggle to achieve due to the complexity of acquiring sensing channel data. In this paper, we introduce a novel auto-encoder (AE)-based framework to acquire sensing information using channel state information (CSI). Specifically, our framework, termed C2S (CSI to sensing)-AE, learns the relationship between CSI and the delay power spectrum (DPS), from which the range information can be readily accessed. To validate our framework's performance, we conducted measurements of DPS and CSI in real-world scenarios and introduced the dataset 'SHU7'. Our extensive experiments demonstrate that the framework excels in C2S extrapolation, surpassing existing methods in terms of accuracy for both delay and signal strength of individual paths. This innovative approach holds the potential to greatly enhance sensing capabilities in future mobile networks, paving the way for more robust and versatile ISAC applications.
Abstract:In modern wireless communication systems, the effective processing of Channel State Information (CSI) is crucial for enhancing communication quality and reliability. However, current methods often handle different tasks in isolation, thereby neglecting the synergies among various tasks and leading to extract CSI features inadequately for subsequent analysis. To address these limitations, this paper introduces a novel Multi-Task Channel Analysis framework named MTCA, aimed at improving the performance of wireless communication even sensing. MTCA is designed to handle four critical tasks, including channel prediction, antenna-domain channel extrapolation, channel identification, and scenario classification. Experiments conducted on a multi-scenario, multi-antenna dataset tailored for UAV-based communications demonstrate that the proposed MTCA exhibits superior comprehension of CSI, achieving enhanced performance across all evaluated tasks. Notably, MTCA reached 100% prediction accuracy in channel identification and scenario classification. Compared to the previous state-of-the-art methods, MTCA improved channel prediction performance by 20.1% and antenna-domain extrapolation performance by 54.5%.
Abstract:In practice, physical spatiotemporal forecasting can suffer from data scarcity, because collecting large-scale data is non-trivial, especially for extreme events. Hence, we propose \method{}, a novel probabilistic framework to realize iterative self-training with new self-ensemble strategies, achieving better physical consistency and generalization on extreme events. Following any base forecasting model, we can encode its deterministic outputs into a latent space and retrieve multiple codebook entries to generate probabilistic outputs. Then BeamVQ extends the beam search from discrete spaces to the continuous state spaces in this field. We can further employ domain-specific metrics (e.g., Critical Success Index for extreme events) to filter out the top-k candidates and develop the new self-ensemble strategy by combining the high-quality candidates. The self-ensemble can not only improve the inference quality and robustness but also iteratively augment the training datasets during continuous self-training. Consequently, BeamVQ realizes the exploration of rare but critical phenomena beyond the original dataset. Comprehensive experiments on different benchmarks and backbones show that BeamVQ consistently reduces forecasting MSE (up to 39%), enhancing extreme events detection and proving its effectiveness in handling data scarcity.
Abstract:In the field of artificial intelligence, self-supervised learning has demonstrated superior generalization capabilities by leveraging large-scale unlabeled datasets for pretraining, which is especially critical for wireless communication models to adapt to a variety of scenarios. This paper innovatively treats Channel State Information (CSI) and Channel Impulse Response (CIR) as naturally aligned multi-modal data and proposes the first MIMO wireless channel foundation model, named CSI-CLIP. By effectively capturing the joint representations of both CIR and CSI, CSI-CLIP exhibits remarkable adaptability across scenarios and robust feature extraction capabilities. Experimental results show that in positioning task, CSI-CLIP reduces the mean error distance by 22%; in beam management task, it increases accuracy by 1% compared to traditional supervised methods, as well as in the channel identification task. These improvements not only highlight the potential and value of CSI-CLIP in integrating sensing and communication but also demonstrate its significant advantages over existing techniques. Moreover, viewing CSI and CIR as multi-modal pairs and contrastive learning for wireless channel foundation model open up new research directions in the domain of MIMO wireless communications.
Abstract:Large Language Models (LLMs) can capture nuanced contextual relationships, reasoning, and complex problem-solving. By leveraging their ability to process and interpret large-scale information, LLMs have shown potential to address domain-specific challenges, including those in autonomous driving systems. This paper proposes a novel framework that leverages LLMs for risk-aware analysis of generated driving scenarios. We hypothesize that LLMs can effectively evaluate whether driving scenarios generated by autonomous driving testing simulators are safety-critical. To validate this hypothesis, we conducted an empirical evaluation to assess the effectiveness of LLMs in performing this task. This framework will also provide feedback to generate the new safety-critical scenario by using adversarial method to modify existing non-critical scenarios and test their effectiveness in validating motion planning algorithms. Code and scenarios are available at: https://github.com/yuangao-tum/Riskaware-Scenario-analyse
Abstract:Wireless positioning technologies hold significant value for applications in autonomous driving, extended reality (XR), unmanned aerial vehicles (UAVs), and more. With the advancement of artificial intelligence (AI), leveraging AI to enhance positioning accuracy and robustness has emerged as a field full of potential. Driven by the requirements and functionalities defined in the 3rd Generation Partnership Project (3GPP) standards, AI/machine learning (ML)-based positioning is becoming a key technology to overcome the limitations of traditional methods. This paper begins with an introduction to the fundamentals of AI and wireless positioning, covering AI models, algorithms, positioning applications, emerging wireless technologies, and the basics of positioning techniques. Subsequently, focusing on standardization progress, we provide a comprehensive review of the evolution of 3GPP positioning standards, with an emphasis on the integration of AI/ML technologies in recent and upcoming releases. Based on the AI/ML-assisted positioning and direct AI/ML positioning schemes outlined in the standards, we conduct an in-depth investigation of related research. we focus on state-of-the-art (SOTA) research in AI-based line-of-sight (LOS)/non-line-of-sight (NLOS) detection, time of arrival (TOA)/time difference of arrival (TDOA) estimation, and angle estimation techniques. For Direct AI/ML Positioning, we explore SOTA advancements in fingerprint-based positioning, knowledge-assisted AI positioning, and channel charting-based positioning. Furthermore, we introduce publicly available datasets for wireless positioning and conclude by summarizing the challenges and opportunities of AI-driven wireless positioning.
Abstract:Recently, vision-language models have made remarkable progress, demonstrating outstanding capabilities in various tasks such as image captioning and video understanding. We introduce Valley2, a novel multimodal large language model designed to enhance performance across all domains and extend the boundaries of practical applications in e-commerce and short video scenarios. Notably, Valley2 achieves state-of-the-art (SOTA) performance on e-commerce benchmarks, surpassing open-source models of similar size by a large margin (79.66 vs. 72.76). Additionally, Valley2 ranks second on the OpenCompass leaderboard among models with fewer than 10B parameters, with an impressive average score of 67.4. The code and model weights are open-sourced at https://github.com/bytedance/Valley.