Abstract:Multimodal information (e.g., visual, acoustic, and textual) has been widely used to enhance representation learning for micro-video recommendation. For integrating multimodal information into a joint representation of micro-video, multimodal fusion plays a vital role in the existing micro-video recommendation approaches. However, the static multimodal fusion used in previous studies is insufficient to model the various relationships among multimodal information of different micro-videos. In this paper, we develop a novel meta-learning-based multimodal fusion framework called Meta Multimodal Fusion (MetaMMF), which dynamically assigns parameters to the multimodal fusion function for each micro-video during its representation learning. Specifically, MetaMMF regards the multimodal fusion of each micro-video as an independent task. Based on the meta information extracted from the multimodal features of the input task, MetaMMF parameterizes a neural network as the item-specific fusion function via a meta learner. We perform extensive experiments on three benchmark datasets, demonstrating the significant improvements over several state-of-the-art multimodal recommendation models, like MMGCN, LATTICE, and InvRL. Furthermore, we lighten our model by adopting canonical polyadic decomposition to improve the training efficiency, and validate its effectiveness through experimental results. Codes are available at https://github.com/hanliu95/MetaMMF.
Abstract:Integrated Sensing and Communications (ISAC) is expected to play a pivotal role in future 6G networks. To maximize time-frequency resource utilization, 6G ISAC systems must exploit data payload signals, that are inherently random, for both communication and sensing tasks. This paper provides a comprehensive analysis of the sensing performance of such communication-centric ISAC signals, with a focus on modulation and pulse shaping design to reshape the statistical properties of their auto-correlation functions (ACFs), thereby improving the target ranging performance. We derive a closed-form expression for the expectation of the squared ACF of random ISAC signals, considering arbitrary modulation bases and constellation mappings within the Nyquist pulse shaping framework. The structure is metaphorically described as an ``iceberg hidden in the sea", where the ``iceberg'' represents the squared mean of the ACF of random ISAC signals, that is determined by the pulse shaping filter, and the ``sea level'' characterizes the corresponding variance, caused by the randomness of the data payload. Our analysis shows that, for QAM/PSK constellations with Nyquist pulse shaping, Orthogonal Frequency Division Multiplexing (OFDM) achieves the lowest ranging sidelobe level across all lags. Building on these insights, we propose a novel Nyquist pulse shaping design to enhance the sensing performance of random ISAC signals. Numerical results validate our theoretical findings, showing that the proposed pulse shaping significantly reduces ranging sidelobes compared to conventional root-raised cosine (RRC) pulse shaping, thereby improving the ranging performance.
Abstract:Since high resolution remote sensing image classification often requires a relatively high computation complexity, lightweight models tend to be practical and efficient. Model pruning is an effective method for model compression. However, existing methods rarely take into account the specificity of remote sensing images, resulting in significant accuracy loss after pruning. To this end, we propose an effective structural pruning approach for remote sensing image classification. Specifically, a pruning strategy that amplifies the differences in channel importance of the model is introduced. Then an adaptive mining loss function is designed for the fine-tuning process of the pruned model. Finally, we conducted experiments on two remote sensing classification datasets. The experimental results demonstrate that our method achieves minimal accuracy loss after compressing remote sensing classification models, achieving state-of-the-art (SoTA) performance.
Abstract:Many researchers collect data from the internet through crowd-sourcing or web crawling to alleviate the data-hungry challenge associated with cross-modal matching. Although such practice does not require expensive annotations, it inevitably introduces mismatched pairs and results in a noisy correspondence problem. Current approaches leverage the memorization effect of deep neural networks to distinguish noise and perform re-weighting. However, briefly lowering the weight of noisy pairs cannot eliminate the negative impact of noisy correspondence in the training process. In this paper, we propose a novel self-drop and dual-weight approach, which achieves elaborate data processing by qua-partitioning the data. Specifically, our approach partitions all data into four types: clean and significant, clean yet insignificant, vague, and noisy. We analyze the effect of noisy and clean data pairs and find that for vision-language pre-training models, a small number of clean samples is more valuable than a majority of noisy ones. Based on this observation, we employ self-drop to discard noisy samples to effectively mitigate the impact of noise. In addition, we adopt a dual-weight strategy to ensure that the model focuses more on significant samples while appropriately leveraging vague samples. Compared to the prior works, our approach is more robust and demonstrates relatively more stable performance on noisy datasets, especially under a high noise ratio. Extensive experiments on three widely used datasets, including Flickr30K, MS-COCO, and Conceptual Captions, validate the effectiveness of our approach. The source code is available at https://github.com/DongChenwei2000/SDD.
Abstract:This paper presents an overview on intelligent reflecting surface (IRS)-enabled sensing and communication for the forthcoming sixth-generation (6G) wireless networks, in which IRSs are strategically deployed to proactively reconfigure wireless environments to improve both sensing and communication (S&C) performance. First, we exploit a single IRS to enable wireless sensing in the base station's (BS's) non-line-of-sight (NLoS) area. In particular, we present three IRS-enabled NLoS target sensing architectures with fully-passive, semi-passive, and active IRSs, respectively. We compare their pros and cons by analyzing the fundamental sensing performance limits for target detection and parameter estimation. Next, we consider a single IRS to facilitate integrated sensing and communication (ISAC), in which the transmit signals at the BS are used for achieving both S&C functionalities, aided by the IRS through reflective beamforming. We present joint transmit signal and receiver processing designs for realizing efficient ISAC, and jointly optimize the transmit beamforming at the BS and reflective beamforming at the IRS to balance the fundamental performance tradeoff between S&C. Furthermore, we discuss multi-IRS networked ISAC, by particularly focusing on multi-IRS-enabled multi-link ISAC, multi-region ISAC, and ISAC signal routing, respectively. Finally, we highlight various promising research topics in this area to motivate future work.
Abstract:Orthogonal time frequency space (OTFS) modulation is anticipated to be a promising candidate for supporting integrated sensing and communications (ISAC) systems, which is considered as a pivotal technique for realizing next generation wireless networks. In this paper, we develop a minimum bit error rate (BER) precoder design for an OTFS-based ISAC system. In particular, the BER minimization problem takes into account the maximum available transmission power budget and the required sensing performance. Different from prior studies that considered ISAC in the time-frequency (TF) domain, we devise the precoder from the perspective of the delay-Doppler (DD) domain by exploiting the equivalent DD domain channel due to the fact that the DD domain channel generally tends to be sparse and quasi-static, which can facilitate a low-overhead ISAC system design. To address the non-convex optimization design problem, we resort to optimizing the lower bound of the derived average BER by adopting Jensen's inequality. Subsequently, the formulated problem is decoupled into two independent sub-problems via singular value decomposition (SVD) methodology. We then theoretically analyze the feasibility conditions of the proposed problem and present a low-complexity iterative solution via leveraging the Lagrangian duality approach. Simulation results verify the effectiveness of our proposed precoder compared to the benchmark schemes and reveal the interplay between sensing and communication for dual-functional precoder design, indicating a trade-off where transmission efficiency is sacrificed for increasing transmission reliability and sensing accuracy.
Abstract:Impulse radio ultra-wideband (IR-UWB) signals stand out for their high temporal resolution, low cost, and large bandwidth, making them a highly promising option for integrated sensing and communication (ISAC) systems. In this paper, we design an ISAC system for a bi-static passive sensing scenario that accommodates multiple targets. Specifically, we introduce two typical modulation schemes, PPM and BPSK, for data transmission. The essential coupling between sensing and communication is examined through the Fisher information matrix (FIM). Accordingly, we introduce a pilot-based decoupling approach that relies on known time-delays, as well as a differential decoupling strategy that uses a known starting symbol position. Finally, we assess the sensing and communication performance under various modulation and demodulation schemes under the constraints of current UWB standards. This assessment utilizes the Cramer-Rao Lower Bound (CRLB) for sensing and the Shannon capacity limit for communication, offering theoretical insights into choosing suitable data signal processing methods in real-world applications.
Abstract:Despite advancements in enhancing LLM safety against jailbreak attacks, evaluating LLM defenses remains a challenge, with current methods often lacking explainability and generalization to complex scenarios, leading to incomplete assessments (e.g., direct judgment without reasoning, low F1 score of GPT-4 in complex cases, bias in multilingual scenarios). To address this, we present JAILJUDGE, a comprehensive benchmark featuring diverse risk scenarios, including synthetic, adversarial, in-the-wild, and multilingual prompts, along with high-quality human-annotated datasets. The JAILJUDGE dataset includes over 35k+ instruction-tune data with reasoning explainability and JAILJUDGETEST, a 4.5k+ labeled set for risk scenarios, and a 6k+ multilingual set across ten languages. To enhance evaluation with explicit reasoning, we propose the JailJudge MultiAgent framework, which enables explainable, fine-grained scoring (1 to 10). This framework supports the construction of instruction-tuning ground truth and facilitates the development of JAILJUDGE Guard, an end-to-end judge model that provides reasoning and eliminates API costs. Additionally, we introduce JailBoost, an attacker-agnostic attack enhancer, and GuardShield, a moderation defense, both leveraging JAILJUDGE Guard. Our experiments demonstrate the state-of-the-art performance of JailJudge methods (JailJudge MultiAgent, JAILJUDGE Guard) across diverse models (e.g., GPT-4, Llama-Guard) and zero-shot scenarios. JailBoost and GuardShield significantly improve jailbreak attack and defense tasks under zero-shot settings, with JailBoost enhancing performance by 29.24% and GuardShield reducing defense ASR from 40.46% to 0.15%.
Abstract:The Direct Segment Anything Model (DirectSAM) excels in class-agnostic contour extraction. In this paper, we explore its use by applying it to optical remote sensing imagery, where semantic contour extraction-such as identifying buildings, road networks, and coastlines-holds significant practical value. Those applications are currently handled via training specialized small models separately on small datasets in each domain. We introduce a foundation model derived from DirectSAM, termed DirectSAM-RS, which not only inherits the strong segmentation capability acquired from natural images, but also benefits from a large-scale dataset we created for remote sensing semantic contour extraction. This dataset comprises over 34k image-text-contour triplets, making it at least 30 times larger than individual dataset. DirectSAM-RS integrates a prompter module: a text encoder and cross-attention layers attached to the DirectSAM architecture, which allows flexible conditioning on target class labels or referring expressions. We evaluate the DirectSAM-RS in both zero-shot and fine-tuning setting, and demonstrate that it achieves state-of-the-art performance across several downstream benchmarks.
Abstract:This paper introduces an integrated sensing, computing, and semantic communication (ISCSC) framework tailored for smart healthcare systems. The framework is evaluated in the context of smart healthcare, optimising the transmit beamforming matrix and semantic extraction ratio for improved data rates, sensing accuracy, and general data protection regulation (GDPR) compliance, while considering IoRT device computing capabilities. Semantic metrics such as semantic transmission rate and semantic secrecy rate are derived to evaluate data rate performance and GDPR risk, respectively, while the Cram\'er-Rao Bound (CRB) assesses sensing performance. Simulation results demonstrate the framework's effectiveness in ensuring reliable sensing, high data rates, and secure communication.