Abstract:Code translation converts code from one programming language to another while maintaining its original functionality, which is crucial for software migration, system refactoring, and cross-platform development. Traditional rule-based methods rely on manually-written rules, which can be time-consuming and often result in less readable code. To overcome this, learning-based methods have been developed, leveraging parallel data to train models for automated code translation. More recently, the advance of Large Language Models (LLMs) further boosts learning-based code translation. Although promising, LLM-translated program still suffers from diverse quality issues (e.g., syntax errors and semantic errors). In particular, it can be challenging for LLMs to self-debug these errors when simply provided with the corresponding error messages. In this work, we propose a novel LLM-based multi-agent system TRANSAGENT, which enhances LLM-based code translation by fixing the syntax errors and semantic errors with the synergy between four LLM-based agents, including Initial Code Translator, Syntax Error Fixer, Code Aligner, and Semantic Error Fixer. The main insight of TRANSAGENT is to first localize the error code block in the target program based on the execution alignment between the target and source program, which can narrow down the fixing space and thus lower down the fixing difficulties. To evaluate TRANSAGENT, we first construct a new benchmark from recent programming tasks to mitigate the potential data leakage issue. On our benchmark, TRANSAGENT outperforms the latest LLM-based code translation technique UniTrans in both translation effectiveness and efficiency; additionally, our evaluation on different LLMs show the generalization of TRANSAGENT and our ablation study shows the contribution of each agent.
Abstract:6G communication will greatly benefit from using extremely large-scale antenna arrays (ELAAs) and new mid-band spectrums (7-24 GHz). These techniques require a thorough exploration of the challenges and potentials of the associated near-field (NF) phenomena. It is crucial to develop accurate NF channel models that include spherical wave propagation and spatial non-stationarity (SnS). However, channel measurement campaigns for mid-band ELAA systems have rarely been reported in the state-of-the-art. To this end, this work develops a channel sounder dedicated to mid-band ELAA systems based on a distributed modular vector network analyzer incorporating radio-over-fiber (RoF), phase compensation, and virtual antenna array schemes. This novel channel-sounding testbed based on off-the-shelf VNA has the potential to enable large-scale experimentation due to its generic and easy-accessible nature. The main challenges and solutions for developing NF channel models for mid-band ELAA systems are discussed, including channel sounders, multipath parameter estimation algorithms, and channel modeling frameworks. Besides, the study reports a measurement campaign in an indoor scenario using a 720-element virtual uniform circular array ELAA operating at {16-20} GHz, highlighting the presence of spherical wavefronts and spatial non-stationary effects. The effectiveness of the proposed near-field channel parameter estimator and channel modeling framework is also demonstrated using the measurement data.
Abstract:Digital twin channel (DTC) is the real-time mapping of a wireless channel from the physical world to the digital world, which is expected to provide significant performance enhancements for the sixth-generation (6G) air-interface design. In this work, we first define five evolution levels of channel twins with the progression of wireless communication. The fifth level, autonomous DTC, is elaborated with multi-dimensional factors such as methodology, characterization precision, and data category. Then, we provide detailed insights into the requirements and architecture of a complete DTC for 6G. Subsequently, a sensing-enhanced real-time channel prediction platform and experimental validations are exhibited. Finally, drawing from the vision of the 6G network, we explore the potential applications and the open issues in future DTC research.
Abstract:Due to distribution shift, deep learning based methods for image dehazing suffer from performance degradation when applied to real-world hazy images. In this paper, we consider a dehazing framework based on conditional diffusion models for improved generalization to real haze. First, we find that optimizing the training objective of diffusion models, i.e., Gaussian noise vectors, is non-trivial. The spectral bias of deep networks hinders the higher frequency modes in Gaussian vectors from being learned and hence impairs the reconstruction of image details. To tackle this issue, we design a network unit, named Frequency Compensation block (FCB), with a bank of filters that jointly emphasize the mid-to-high frequencies of an input signal. We demonstrate that diffusion models with FCB achieve significant gains in both perceptual and distortion metrics. Second, to further boost the generalization performance, we propose a novel data synthesis pipeline, HazeAug, to augment haze in terms of degree and diversity. Within the framework, a solid baseline for blind dehazing is set up where models are trained on synthetic hazy-clean pairs, and directly generalize to real data. Extensive evaluations show that the proposed dehazing diffusion model significantly outperforms state-of-the-art methods on real-world images.
Abstract:In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension and generation tasks. We have the following main findings. First, for the zero-shot setting, instructed LLMs are very competitive on code comprehension and generation tasks and sometimes even better than small SOTA models specifically fine-tuned on each downstream task. We also find that larger instructed LLMs are not always better on code-related tasks. Second, for the few-shot setting, we find that adding demonstration examples substantially helps instructed LLMs perform better on most code comprehension and generation tasks; however, the examples would sometimes induce unstable or even worse performance. Furthermore, we find widely-used BM25-based shot selection strategy significantly outperforms the basic random selection or fixed selection only on generation problems. Third, for the fine-tuning setting, we find that fine-tuning could further improve the model performance on downstream code comprehension and generation tasks compared to the zero-shot/one-shot performance. In addition, after being fine-tuned on the same downstream task dataset, instructed LLMs outperform both the small SOTA models and similar-scaled LLMs without instruction tuning. Based on our findings, we further present practical implications on model and usage recommendation, performance and cost trade-offs, and future direction.
Abstract:Technology research and standardization work of sixth generation (6G) has been carried out worldwide. Channel research is the prerequisite of 6G technology evaluation and optimization. This paper presents a survey and tutorial on channel measurement, modeling, and simulation for 6G. We first highlight the challenges of channel for 6G systems, including higher frequency band, extremely large antenna array, new technology combinations, and diverse application scenarios. A review of channel measurement and modeling for four possible 6G enabling technologies is then presented, i.e., terahertz communication, massive multiple-input multiple-output communication, joint communication and sensing, and reconfigurable intelligent surface. Finally, we introduce a 6G channel simulation platform and provide examples of its implementation. The goal of this paper is to help both professionals and non-professionals know the progress of 6G channel research, understand the 6G channel model, and use it for 6G simulation.
Abstract:Joint communication and sensing (JCAS) has been recognized as a promising technology in the sixth generation (6G) communication. A realistic channel model is a prerequisite for designing JCAS systems. Most existing channel models independently generate the communication and sensing channels under the same framework. However, due to the multiplexing of hardware resources (e.g., antennas) and the same environment, signals enabled for communication and sensing may experience some shared propagation scatterers. This practical sharing feature necessities the joint generation of communication and sensing channels for realistic modeling, where the shared clusters (contributed by the shared scatterers) should be jointly reconstructed for both channels. In this paper, we first conduct communication and sensing channel measurements for an indoor scenario at 28 GHz. The power-angular-delay profiles (PADPs) of multipath components (MPCs) are obtained, and the shared scatterers by communication and sensing channels are intuitively observed. Then, a stochastic JCAS channel model is proposed to capture the sharing feature, where shared and non-shared clusters by the two channels are defined and superimposed. To extract those clusters from measured JCAS channels, a KPowerMeans-based joint clustering algorithm (KPM-JCA) is novelly introduced. Finally, stochastic channel characteristics are analyzed, and the practicality and controllability of the proposed model are validated based on the measurements and empirical simulations. The proposed model can realistically capture the sharing feature of JCAS channels, which is valuable for the design and deployment of JCAS systems.
Abstract:Semantic localization (SeLo) refers to the task of obtaining the most relevant locations in large-scale remote sensing (RS) images using semantic information such as text. As an emerging task based on cross-modal retrieval, SeLo achieves semantic-level retrieval with only caption-level annotation, which demonstrates its great potential in unifying downstream tasks. Although SeLo has been carried out successively, but there is currently no work has systematically explores and analyzes this urgent direction. In this paper, we thoroughly study this field and provide a complete benchmark in terms of metrics and testdata to advance the SeLo task. Firstly, based on the characteristics of this task, we propose multiple discriminative evaluation metrics to quantify the performance of the SeLo task. The devised significant area proportion, attention shift distance, and discrete attention distance are utilized to evaluate the generated SeLo map from pixel-level and region-level. Next, to provide standard evaluation data for the SeLo task, we contribute a diverse, multi-semantic, multi-objective Semantic Localization Testset (AIR-SLT). AIR-SLT consists of 22 large-scale RS images and 59 test cases with different semantics, which aims to provide a comprehensive evaluations for retrieval models. Finally, we analyze the SeLo performance of RS cross-modal retrieval models in detail, explore the impact of different variables on this task, and provide a complete benchmark for the SeLo task. We have also established a new paradigm for RS referring expression comprehension, and demonstrated the great advantage of SeLo in semantics through combining it with tasks such as detection and road extraction. The proposed evaluation metrics, semantic localization testsets, and corresponding scripts have been open to access at github.com/xiaoyuan1996/SemanticLocalizationMetrics .
Abstract:Few-shot segmentation of point cloud remains a challenging task, as there is no effective way to convert local point cloud information to global representation, which hinders the generalization ability of point features. In this study, we propose a bidirectional feature globalization (BFG) approach, which leverages the similarity measurement between point features and prototype vectors to embed global perception to local point features in a bidirectional fashion. With point-to-prototype globalization (Po2PrG), BFG aggregates local point features to prototypes according to similarity weights from dense point features to sparse prototypes. With prototype-to-point globalization (Pr2PoG), the global perception is embedded to local point features based on similarity weights from sparse prototypes to dense point features. The sparse prototypes of each class embedded with global perception are summarized to a single prototype for few-shot 3D segmentation based on the metric learning framework. Extensive experiments on S3DIS and ScanNet demonstrate that BFG significantly outperforms the state-of-the-art methods.
Abstract:Remote sensing (RS) cross-modal text-image retrieval has attracted extensive attention for its advantages of flexible input and efficient query. However, traditional methods ignore the characteristics of multi-scale and redundant targets in RS image, leading to the degradation of retrieval accuracy. To cope with the problem of multi-scale scarcity and target redundancy in RS multimodal retrieval task, we come up with a novel asymmetric multimodal feature matching network (AMFMN). Our model adapts to multi-scale feature inputs, favors multi-source retrieval methods, and can dynamically filter redundant features. AMFMN employs the multi-scale visual self-attention (MVSA) module to extract the salient features of RS image and utilizes visual features to guide the text representation. Furthermore, to alleviate the positive samples ambiguity caused by the strong intraclass similarity in RS image, we propose a triplet loss function with dynamic variable margin based on prior similarity of sample pairs. Finally, unlike the traditional RS image-text dataset with coarse text and higher intraclass similarity, we construct a fine-grained and more challenging Remote sensing Image-Text Match dataset (RSITMD), which supports RS image retrieval through keywords and sentence separately and jointly. Experiments on four RS text-image datasets demonstrate that the proposed model can achieve state-of-the-art performance in cross-modal RS text-image retrieval task.