Abstract:Orthogonal time frequency space (OTFS) modulation is anticipated to be a promising candidate for supporting integrated sensing and communications (ISAC) systems, which is considered as a pivotal technique for realizing next generation wireless networks. In this paper, we develop a minimum bit error rate (BER) precoder design for an OTFS-based ISAC system. In particular, the BER minimization problem takes into account the maximum available transmission power budget and the required sensing performance. Different from prior studies that considered ISAC in the time-frequency (TF) domain, we devise the precoder from the perspective of the delay-Doppler (DD) domain by exploiting the equivalent DD domain channel due to the fact that the DD domain channel generally tends to be sparse and quasi-static, which can facilitate a low-overhead ISAC system design. To address the non-convex optimization design problem, we resort to optimizing the lower bound of the derived average BER by adopting Jensen's inequality. Subsequently, the formulated problem is decoupled into two independent sub-problems via singular value decomposition (SVD) methodology. We then theoretically analyze the feasibility conditions of the proposed problem and present a low-complexity iterative solution via leveraging the Lagrangian duality approach. Simulation results verify the effectiveness of our proposed precoder compared to the benchmark schemes and reveal the interplay between sensing and communication for dual-functional precoder design, indicating a trade-off where transmission efficiency is sacrificed for increasing transmission reliability and sensing accuracy.
Abstract:Impulse radio ultra-wideband (IR-UWB) signals stand out for their high temporal resolution, low cost, and large bandwidth, making them a highly promising option for integrated sensing and communication (ISAC) systems. In this paper, we design an ISAC system for a bi-static passive sensing scenario that accommodates multiple targets. Specifically, we introduce two typical modulation schemes, PPM and BPSK, for data transmission. The essential coupling between sensing and communication is examined through the Fisher information matrix (FIM). Accordingly, we introduce a pilot-based decoupling approach that relies on known time-delays, as well as a differential decoupling strategy that uses a known starting symbol position. Finally, we assess the sensing and communication performance under various modulation and demodulation schemes under the constraints of current UWB standards. This assessment utilizes the Cramer-Rao Lower Bound (CRLB) for sensing and the Shannon capacity limit for communication, offering theoretical insights into choosing suitable data signal processing methods in real-world applications.
Abstract:Despite advancements in enhancing LLM safety against jailbreak attacks, evaluating LLM defenses remains a challenge, with current methods often lacking explainability and generalization to complex scenarios, leading to incomplete assessments (e.g., direct judgment without reasoning, low F1 score of GPT-4 in complex cases, bias in multilingual scenarios). To address this, we present JAILJUDGE, a comprehensive benchmark featuring diverse risk scenarios, including synthetic, adversarial, in-the-wild, and multilingual prompts, along with high-quality human-annotated datasets. The JAILJUDGE dataset includes over 35k+ instruction-tune data with reasoning explainability and JAILJUDGETEST, a 4.5k+ labeled set for risk scenarios, and a 6k+ multilingual set across ten languages. To enhance evaluation with explicit reasoning, we propose the JailJudge MultiAgent framework, which enables explainable, fine-grained scoring (1 to 10). This framework supports the construction of instruction-tuning ground truth and facilitates the development of JAILJUDGE Guard, an end-to-end judge model that provides reasoning and eliminates API costs. Additionally, we introduce JailBoost, an attacker-agnostic attack enhancer, and GuardShield, a moderation defense, both leveraging JAILJUDGE Guard. Our experiments demonstrate the state-of-the-art performance of JailJudge methods (JailJudge MultiAgent, JAILJUDGE Guard) across diverse models (e.g., GPT-4, Llama-Guard) and zero-shot scenarios. JailBoost and GuardShield significantly improve jailbreak attack and defense tasks under zero-shot settings, with JailBoost enhancing performance by 29.24% and GuardShield reducing defense ASR from 40.46% to 0.15%.
Abstract:The Direct Segment Anything Model (DirectSAM) excels in class-agnostic contour extraction. In this paper, we explore its use by applying it to optical remote sensing imagery, where semantic contour extraction-such as identifying buildings, road networks, and coastlines-holds significant practical value. Those applications are currently handled via training specialized small models separately on small datasets in each domain. We introduce a foundation model derived from DirectSAM, termed DirectSAM-RS, which not only inherits the strong segmentation capability acquired from natural images, but also benefits from a large-scale dataset we created for remote sensing semantic contour extraction. This dataset comprises over 34k image-text-contour triplets, making it at least 30 times larger than individual dataset. DirectSAM-RS integrates a prompter module: a text encoder and cross-attention layers attached to the DirectSAM architecture, which allows flexible conditioning on target class labels or referring expressions. We evaluate the DirectSAM-RS in both zero-shot and fine-tuning setting, and demonstrate that it achieves state-of-the-art performance across several downstream benchmarks.
Abstract:This paper introduces an integrated sensing, computing, and semantic communication (ISCSC) framework tailored for smart healthcare systems. The framework is evaluated in the context of smart healthcare, optimising the transmit beamforming matrix and semantic extraction ratio for improved data rates, sensing accuracy, and general data protection regulation (GDPR) compliance, while considering IoRT device computing capabilities. Semantic metrics such as semantic transmission rate and semantic secrecy rate are derived to evaluate data rate performance and GDPR risk, respectively, while the Cram\'er-Rao Bound (CRB) assesses sensing performance. Simulation results demonstrate the framework's effectiveness in ensuring reliable sensing, high data rates, and secure communication.
Abstract:As various types of crime continue to threaten public safety and economic development, predicting the occurrence of multiple types of crimes becomes increasingly vital for effective prevention measures. Although extensive efforts have been made, most of them overlook the heterogeneity of different crime categories and fail to address the issue of imbalanced spatial distribution. In this work, we propose a Spatial-Temporal Mixture-of-Graph-Experts (ST-MoGE) framework for collective multiple-type crime prediction. To enhance the model's ability to identify diverse spatial-temporal dependencies and mitigate potential conflicts caused by spatial-temporal heterogeneity of different crime categories, we introduce an attentive-gated Mixture-of-Graph-Experts (MGEs) module to capture the distinctive and shared crime patterns of each crime category. Then, we propose Cross-Expert Contrastive Learning(CECL) to update the MGEs and force each expert to focus on specific pattern modeling, thereby reducing blending and redundancy. Furthermore, to address the issue of imbalanced spatial distribution, we propose a Hierarchical Adaptive Loss Re-weighting (HALR) approach to eliminate biases and insufficient learning of data-scarce regions. To evaluate the effectiveness of our methods, we conduct comprehensive experiments on two real-world crime datasets and compare our results with twelve advanced baselines. The experimental results demonstrate the superiority of our methods.
Abstract:Power Line Autonomous Inspection (PLAI) plays a crucial role in the construction of smart grids due to its great advantages of low cost, high efficiency, and safe operation. PLAI is completed by accurately detecting the electrical components and defects in the aerial images captured by Unmanned Aerial Vehicles (UAVs). However, the visible quality of aerial images is inevitably degraded by adverse weather like haze, rain, or snow, which are found to drastically decrease the detection accuracy in our research. To circumvent this problem, we propose a new task of Power Line Aerial Image Restoration under Adverse Weather (PLAIR-AW), which aims to recover clean and high-quality images from degraded images with bad weather thus improving detection performance for PLAI. In this context, we are the first to release numerous corresponding datasets, namely, HazeCPLID, HazeTTPLA, HazeInsPLAD for power line aerial image dehazing, RainCPLID, RainTTPLA, RainInsPLAD for power line aerial image deraining, SnowCPLID, SnowInsPLAD for power line aerial image desnowing, which are synthesized upon the public power line aerial image datasets of CPLID, TTPLA, InsPLAD following the mathematical models. Meanwhile, we select numerous state-of-the-art methods from image restoration community as the baseline methods for PLAIR-AW. At last, we conduct large-scale empirical experiments to evaluate the performance of baseline methods on the proposed datasets. The proposed datasets and trained models are available at https://github.com/ntuhubin/PLAIR-AW.
Abstract:Generating high-fidelity, temporally consistent videos in autonomous driving scenarios faces a significant challenge, e.g. problematic maneuvers in corner cases. Despite recent video generation works are proposed to tackcle the mentioned problem, i.e. models built on top of Diffusion Transformers (DiT), works are still missing which are targeted on exploring the potential for multi-view videos generation scenarios. Noticeably, we propose the first DiT-based framework specifically designed for generating temporally and multi-view consistent videos which precisely match the given bird's-eye view layouts control. Specifically, the proposed framework leverages a parameter-free spatial view-inflated attention mechanism to guarantee the cross-view consistency, where joint cross-attention modules and ControlNet-Transformer are integrated to further improve the precision of control. To demonstrate our advantages, we extensively investigate the qualitative comparisons on nuScenes dataset, particularly in some most challenging corner cases. In summary, the effectiveness of our proposed method in producing long, controllable, and highly consistent videos under difficult conditions is proven to be effective.
Abstract:Knowledge distillation (KD) is an effective method for compressing models in object detection tasks. Due to limited computational capability, UAV-based object detection (UAV-OD) widely adopt the KD technique to obtain lightweight detectors. Existing methods often overlook the significant differences in feature space caused by the large gap in scale between the teacher and student models. This limitation hampers the efficiency of knowledge transfer during the distillation process. Furthermore, the complex backgrounds in UAV images make it challenging for the student model to efficiently learn the object features. In this paper, we propose a novel knowledge distillation framework for UAV-OD. Specifically, a progressive distillation approach is designed to alleviate the feature gap between teacher and student models. Then a new feature alignment method is provided to extract object-related features for enhancing student model's knowledge reception efficiency. Finally, extensive experiments are conducted to validate the effectiveness of our proposed approach. The results demonstrate that our proposed method achieves state-of-the-art (SoTA) performance in two UAV-OD datasets.
Abstract:Few-shot classification (FSC) is a fundamental yet challenging task in computer vision that involves recognizing novel classes from limited data. While previous methods have focused on enhancing visual features or incorporating additional modalities, Large Vision Language Models (LVLMs) offer a promising alternative due to their rich knowledge and strong visual perception. However, LVLMs risk learning specific response formats rather than effectively extracting useful information from support data in FSC tasks. In this paper, we investigate LVLMs' performance in FSC and identify key issues such as insufficient learning and the presence of severe positional biases. To tackle the above challenges, we adopt the meta-learning strategy to teach models "learn to learn". By constructing a rich set of meta-tasks for instruction fine-tuning, LVLMs enhance the ability to extract information from few-shot support data for classification. Additionally, we further boost LVLM's few-shot learning capabilities through label augmentation and candidate selection in the fine-tuning and inference stage, respectively. Label augmentation is implemented via a character perturbation strategy to ensure the model focuses on support information. Candidate selection leverages attribute descriptions to filter out unreliable candidates and simplify the task. Extensive experiments demonstrate that our approach achieves superior performance on both general and fine-grained datasets. Furthermore, our candidate selection strategy has been proven beneficial for training-free LVLMs.