School of Computer Science, Tianjin University
Abstract:Video semantic segmentation(VSS) has been widely employed in lots of fields, such as simultaneous localization and mapping, autonomous driving and surveillance. Its core challenge is how to leverage temporal information to achieve better segmentation. Previous efforts have primarily focused on pixel-level static-dynamic contexts matching, utilizing techniques such as optical flow and attention mechanisms. Instead, this paper rethinks static-dynamic contexts at the class level and proposes a novel static-dynamic class-level perceptual consistency (SD-CPC) framework. In this framework, we propose multivariate class prototype with contrastive learning and a static-dynamic semantic alignment module. The former provides class-level constraints for the model, obtaining personalized inter-class features and diversified intra-class features. The latter first establishes intra-frame spatial multi-scale and multi-level correlations to achieve static semantic alignment. Then, based on cross-frame static perceptual differences, it performs two-stage cross-frame selective aggregation to achieve dynamic semantic alignment. Meanwhile, we propose a window-based attention map calculation method that leverages the sparsity of attention points during cross-frame aggregation to reduce computation cost. Extensive experiments on VSPW and Cityscapes datasets show that the proposed approach outperforms state-of-the-art methods. Our implementation will be open-sourced on GitHub.
Abstract:Traditional recommendation systems focus on maximizing user satisfaction by suggesting their favorite items. This user-centric approach may lead to unfair exposure distribution among the providers. On the contrary, a provider-centric design might become unfair to the users. Therefore, this paper proposes a re-ranking model FairSort\footnote{\textbf{Reproducibility:}The code and datasets are available at \url{https://github.com/13543024276/FairSort}} to find a trade-off solution among user-side fairness, provider-side fairness, and personalized recommendations utility. Previous works habitually treat this issue as a knapsack problem, incorporating both-side fairness as constraints. In this paper, we adopt a novel perspective, treating each recommendation list as a runway rather than a knapsack. In this perspective, each item on the runway gains a velocity and runs within a specific time, achieving re-ranking for both-side fairness. Meanwhile, we ensure the Minimum Utility Guarantee for personalized recommendations by designing a Binary Search approach. This can provide more reliable recommendations compared to the conventional greedy strategy based on the knapsack problem. We further broaden the applicability of FairSort, designing two versions for online and offline recommendation scenarios. Theoretical analysis and extensive experiments on real-world datasets indicate that FairSort can ensure more reliable personalized recommendations while considering fairness for both the provider and user.
Abstract:With the rapid application of unmanned aerial vehicles (UAVs) in urban areas, the identification and tracking of hovering UAVs have become critical challenges, significantly impacting the safety of aircraft take-off and landing operations. As a promising technology for 6G mobile systems, integrated sensing and communication (ISAC) can be used to detect high-mobility UAVs with a low deployment cost. The micro-Doppler signals from UAV rotors can be leveraged to address the detection of low-mobility and hovering UAVs using ISAC signals. However, determining whether the frame structure of the ISAC system can be used to identify UAVs, and how to accurately capture the weak rotor micro-Doppler signals of UAVs in complex environments, remain two challenging problems. This paper first proposes a novel frame structure for UAV micro-Doppler extraction and the representation of UAV micro-Doppler signals within the channel state information (CSI). Furthermore, to address complex environments and the interference caused by UAV body vibrations, the rotor micro-Doppler null space pursuit (rmD-NSP) algorithm and the feature extraction algorithm synchroextracting transform (SET) are designed to effectively separate UAV's rotor micro-Doppler signals and enhance their features in the spectrogram. Finally, both simulation and hardware testbed demonstrate that the proposed rmD-NSP algorithm enables the ISAC base station (BS) to accurately and completely extract UAV's rotor micro-Doppler signals. Within a 0.1s observation period, ISAC BS successfully captures eight rotations of the DJI M300 RTK UAV's rotor in urban environments. Compared to the existing AM-FM NSP and NSP signal decomposition algorithms, the integrity of the rotor micro-Doppler features is improved by 60%.
Abstract:Multiple-choice question answering (MCQA) becomes particularly challenging when all choices are relevant to the question and are semantically similar. Yet this setting of MCQA can potentially provide valuable clues for choosing the right answer. Existing models often rank each choice separately, overlooking the context provided by other choices. Specifically, they fail to leverage the semantic commonalities and nuances among the choices for reasoning. In this paper, we propose a novel MCQA model by differentiating choices through identifying and eliminating their commonality, called DCQA. Our model captures token-level attention of each choice to the question, and separates tokens of the question attended to by all the choices (i.e., commonalities) from those by individual choices (i.e., nuances). Using the nuances as refined contexts for the choices, our model can effectively differentiate choices with subtle differences and provide justifications for choosing the correct answer. We conduct comprehensive experiments across five commonly used MCQA benchmarks, demonstrating that DCQA consistently outperforms baseline models. Furthermore, our case study illustrates the effectiveness of the approach in directing the attention of the model to more differentiating features.
Abstract:The integrated sensing and communication (ISAC) system under multi-input multi-output (MIMO) architecture achieves dual functionalities of sensing and communication on the same platform by utilizing spatial gain, which provides a feasible paradigm facing spectrum congestion. However, the dual functionalities of sensing and communication operating simultaneously in the same platform bring severe interference in the ISAC systems. Facing this challenge, we propose a joint optimization framework for transmit beamforming and receive filter design for ISAC systems with MIMO architecture. We aim to maximize the signal-to-clutter-plus-noise ratio (SCNR) at the receiver while considering various constraints such as waveform similarity, power budget, and communication performance requirements to ensure the integration of the dual functionalities. In particular, the overall transmit beamforming is refined into sensing beamforming and communication beamforming, and a quadratic transformation (QT) is introduced to relax and convert the complex non-convex optimization objective. An efficient algorithm based on covariance matrix tapers (CMT) is proposed to restructure the clutter covariance matrix considering the mismatched steering vector, thereby improving the robustness of the ISAC transceiver design. Numerical simulations are provided to demonstrate the effectiveness of the proposed algorithm.
Abstract:A novel modular extremely large-scale multiple-input-multiple-output (XL-MIMO) integrated sensing and communication (ISAC) framework is proposed in this paper. We consider a downlink ISAC scenario and exploit the modular array architecture to enhance the communication spectral efficiency and sensing resolution while reducing the channel modeling complexity by employing the hybrid spherical and planar wavefront model. Considering the hybrid digital-analog structure inherent to modular arrays, we formulate a joint analog-digital beamforming design problem based on the communication spectral efficiency and sensing signal-to-clutter-plus-noise ratio (SCNR). By exploring the structural similarity of the communication and sensing channels, it is proved that the optimal transmit covariance matrix lies in the subspace spanned by the subarray response vectors, yielding a closed-form solution for the optimal analog beamformer. Consequently, the joint design problem is transformed into a low-dimensional rank-constrained digital beamformer optimization. We first propose a manifold optimization method that directly optimizes the digital beamformer on the rank-constrained Stiefel manifold. Additionally, we develop an semidefinite relaxation (SDR)-based approach that relaxes the rank constraint and employ the randomization technique to obtain a near-optimal solution. Simulation results demonstrate the effectiveness of the proposed modular XL-MIMO ISAC framework and algorithms.
Abstract:Recognizing relations between two words is a fundamental task with the broad applications. Different from extracting relations from text, it is difficult to identify relations among words without their contexts. Especially for long-tail relations, it becomes more difficult due to inadequate semantic features. Existing approaches based on language models (LMs) utilize rich knowledge of LMs to enhance the semantic features of relations. However, they capture uncommon relations while overlooking less frequent but meaningful ones since knowledge of LMs seriously relies on trained data where often represents common relations. On the other hand, long-tail relations are often uncommon in training data. It is interesting but not trivial to use external knowledge to enrich LMs due to collecting corpus containing long-tail relationships is hardly feasible. In this paper, we propose a sememe knowledge enhanced method (SememeLM) to enhance the representation of long-tail relations, in which sememes can break the contextual constraints between wors. Firstly, we present a sememe relation graph and propose a graph encoding method. Moreover, since external knowledge base possibly consisting of massive irrelevant knowledge, the noise is introduced. We propose a consistency alignment module, which aligns the introduced knowledge with LMs, reduces the noise and integrates the knowledge into the language model. Finally, we conducted experiments on word analogy datasets, which evaluates the ability to distinguish relation representations subtle differences, including long-tail relations. Extensive experiments show that our approach outperforms some state-of-the-art methods.
Abstract:Physical layer security (PLS) technology based on the fixed-position antenna (FPA) has {attracted widespread attention}. Due to the fixed feature of the antennas, current FPA-based PLS schemes cannot fully utilize the spatial degree of freedom, and thus a weaken secure gain in the desired/undesired direction may exist. Different from the concept of FPA, mobile antenna (MA) is a novel technology that {reconfigures} the wireless channels and enhances the corresponding capacity through the flexible movement of antennas on a minor scale. MA-empowered PLS enjoys huge potential and deserves further investigation. In this paper, we, for the first time, investigate the secrecy performance of MA-enabled PLS system where a MA-based Alice transmits the confidential information to multiple single-antenna Bobs, in the presence of the single-antenna eavesdropper (Eve) {in the absence} of perfect channel state information (CSI). For the purpose of the secrecy rate maximization of the worst Bob, we jointly design the transmit beamforming and antenna positions at the Alice, subject to the minimum moving distance of the antenna, uncertainty CSI of Eve, and maximum transmit power. Furthermore, the projected gradient ascent (PGA), alternating optimization (AO), and simulated annealing (SA) {are} adopted to solve the non-convex characteristics of the problem of the secrecy rate maximization. Simulation results demonstrate the effectiveness and correctness of the proposed method. In particular, MA-enabled PLS scheme can significantly enhance the secrecy rate compared to the conventional FPA-based ones for different settings of key system parameters.
Abstract:As the size and context length of Large Language Models (LLMs) grow, weight-activation quantization has emerged as a crucial technique for efficient deployment of LLMs. Compared to weight-only quantization, weight-activation quantization presents greater challenges due to the presence of outliers in activations. Existing methods have made significant progress by exploring mixed-precision quantization and outlier suppression. However, these methods primarily focus on optimizing the results of single matrix multiplication, neglecting the bidirectional propagation of quantization errors in LLMs. Specifically, errors accumulate vertically within the same token through layers, and diffuse horizontally across different tokens due to self-attention mechanisms. To address this issue, we introduce BiSup, a Bidirectional quantization error Suppression method. By constructing appropriate optimizable parameter spaces, BiSup utilizes a small amount of data for quantization-aware parameter-efficient fine-tuning to suppress the error vertical accumulation. Besides, BiSup employs prompt mixed-precision quantization strategy, which preserves high precision for the key-value cache of system prompts, to mitigate the error horizontal diffusion. Extensive experiments on Llama and Qwen families demonstrate that BiSup can improve performance over two state-of-the-art methods (the average WikiText2 perplexity decreases from 13.26 to 9.41 for Atom and from 14.33 to 7.85 for QuaRot under the W3A3-g128 configuration), further facilitating the practical applications of low-bit weight-activation quantization.
Abstract:Quick response to disasters is crucial for saving lives and reducing loss. This requires low-latency uploading of situation information to the remote command center. Since terrestrial infrastructures are often damaged in disaster areas, non-terrestrial networks (NTNs) are preferable to provide network coverage, and mobile edge computing (MEC) could be integrated to improve the latency performance. Nevertheless, the communications and computing in MEC-enabled NTNs are strongly coupled, which complicates the system design. In this paper, an edge information hub (EIH) that incorporates communication, computing and storage capabilities is proposed to synergize communication and computing and enable systematic design. We first address the joint data scheduling and resource orchestration problem to minimize the latency for uploading sensing data. The problem is solved using an optimal resource orchestration algorithm. On that basis, we propose the principles for resource configuration of the EIH considering payload constraints on size, weight and energy supply. Simulation results demonstrate the superiority of our proposed scheme in reducing the overall upload latency, thus enabling quick emergency rescue.