Abstract:Fluid antenna multiple access (FAMA), enabled by the fluid antenna system (FAS), offers a new and straightforward solution to massive connectivity. Previous results on FAMA were primarily based on narrowband channels. This paper studies the adoption of FAMA within the fifth-generation (5G) orthogonal frequency division multiplexing (OFDM) framework, referred to as OFDM-FAMA, and evaluate its performance in broadband multipath channels. We first design the OFDM-FAMA system, taking into account 5G channel coding and OFDM modulation. Then the system's achievable rate is analyzed, and an algorithm to approximate the FAS configuration at each user is proposed based on the rate. Extensive link-level simulation results reveal that OFDM-FAMA can significantly improve the multiplexing gain over the OFDM system with fixed-position antenna (FPA) users, especially when robust channel coding is applied and the number of radio-frequency (RF) chains at each user is small.
Abstract:This paper investigates joint device identification, channel estimation, and symbol detection for LEO satellite-enabled grant-free random access systems, specifically targeting scenarios where remote Internet-of-Things (IoT) devices operate without global navigation satellite system (GNSS) assistance. Considering the constrained power consumption of these devices, the large differential delay and Doppler shift are handled at the satellite receiver. We firstly propose a spreading-based multi-frame transmission scheme with orthogonal time-frequency space (OTFS) modulation to mitigate the doubly dispersive effect in time and frequency, and then analyze the input-output relationship of the system. Next, we propose a receiver structure based on three modules: a linear module for identifying active devices that leverages the generalized approximate message passing algorithm to eliminate inter-user and inter-carrier interference; a non-linear module that employs the message passing algorithm to jointly estimate the channel and detect the transmitted symbols; and a third module that aims to exploit the three dimensional block channel sparsity in the delay-Doppler-angle domain. Soft information is exchanged among the three modules by careful message scheduling. Furthermore, the expectation-maximization algorithm is integrated to adjust phase rotation caused by the fractional Doppler and to learn the hyperparameters in the priors. Finally, the convolutional neural network is incorporated to enhance the symbol detection. Simulation results demonstrate that the proposed transmission scheme boosts the system performance, and the designed algorithms outperform the conventional methods significantly in terms of the device identification, channel estimation, and symbol detection.
Abstract:Learned image compression (LIC) methods often employ symmetrical encoder and decoder architectures, evitably increasing decoding time. However, practical scenarios demand an asymmetric design, where the decoder requires low complexity to cater to diverse low-end devices, while the encoder can accommodate higher complexity to improve coding performance. In this paper, we propose an asymmetric lightweight learned image compression (AsymLLIC) architecture with a novel training scheme, enabling the gradual substitution of complex decoding modules with simpler ones. Building upon this approach, we conduct a comprehensive comparison of different decoder network structures to strike a better trade-off between complexity and compression performance. Experiment results validate the efficiency of our proposed method, which not only achieves comparable performance to VVC but also offers a lightweight decoder with only 51.47 GMACs computation and 19.65M parameters. Furthermore, this design methodology can be easily applied to any LIC models, enabling the practical deployment of LIC techniques.
Abstract:Neural image compression often faces a challenging trade-off among rate, distortion and perception. While most existing methods typically focus on either achieving high pixel-level fidelity or optimizing for perceptual metrics, we propose a novel approach that simultaneously addresses both aspects for a fixed neural image codec. Specifically, we introduce a plug-and-play module at the decoder side that leverages a latent diffusion process to transform the decoded features, enhancing either low distortion or high perceptual quality without altering the original image compression codec. Our approach facilitates fusion of original and transformed features without additional training, enabling users to flexibly adjust the balance between distortion and perception during inference. Extensive experimental results demonstrate that our method significantly enhances the pretrained codecs with a wide, adjustable distortion-perception range while maintaining their original compression capabilities. For instance, we can achieve more than 150% improvement in LPIPS-BDRate without sacrificing more than 1 dB in PSNR.
Abstract:The fluid antenna (FA)-enabled multiple-input multiple-output (MIMO) system based on index modulation (IM), referred to as FA-IM, significantly enhances spectral efficiency (SE) compared to the conventional FA-assisted MIMO system. This paper proposes an innovative FA grouping-based IM (FAG-IM) system to improve performance in mitigating the high spatial correlation between multiple activated ports. A block grouping scheme is employed based on the spatial correlation model and the distribution structure of the ports. Then, a closed-form expression for the average bit error probability (ABEP) upper bound of the FAG-IM system is derived. In order to reduce the receiver complexity of the proposed system, the message passing mechanism is first incorporated into the FAG-IM system. Subsequently, within the approximate message passing (AMP) framework, an efficient structured AMP (S-AMP) detector is devised by leveraging the structural characteristics of the transmission signal vector. Simulation results confirm that the proposed FAG-IM system significantly outperforms the existing FA-IM system in the presence of spatial correlation. The derived ABEP curve aligns well with the numerical results, providing an efficient theoretical tool for evaluating the system performance. Additionally, simulation results demonstrate that the proposed low-complexity S-AMP detector not only reduces the time complexity to a linear scale but also substantially improves bit error rate (BER) performance compared to the minimum mean square error (MMSE) detector, thus facilitating the practical implementation of the FAG-IM system.
Abstract:We introduce OBI-Bench, a holistic benchmark crafted to systematically evaluate large multi-modal models (LMMs) on whole-process oracle bone inscriptions (OBI) processing tasks demanding expert-level domain knowledge and deliberate cognition. OBI-Bench includes 5,523 meticulously collected diverse-sourced images, covering five key domain problems: recognition, rejoining, classification, retrieval, and deciphering. These images span centuries of archaeological findings and years of research by front-line scholars, comprising multi-stage font appearances from excavation to synthesis, such as original oracle bone, inked rubbings, oracle bone fragments, cropped single character, and handprinted character. Unlike existing benchmarks, OBI-Bench focuses on advanced visual perception and reasoning with OBI-specific knowledge, challenging LMMs to perform tasks akin to those faced by experts. The evaluation of 6 proprietary LMMs as well as 17 open-source LMMs highlights the substantial challenges and demands posed by OBI-Bench. Even the latest versions of GPT-4o, Gemini 1.5 Pro, and Qwen-VL-Max are still far from public-level humans in some fine-grained perception tasks. However, they perform at a level comparable to untrained humans in deciphering task, indicating remarkable capabilities in offering new interpretative perspectives and generating creative guesses. We hope OBI-Bench can facilitate the community to develop domain-specific multi-modal foundation models towards ancient language research and delve deeper to discover and enhance these untapped potentials of LMMs.
Abstract:The widespread use of image acquisition technologies, along with advances in facial recognition, has raised serious privacy concerns. Face de-identification usually refers to the process of concealing or replacing personal identifiers, which is regarded as an effective means to protect the privacy of facial images. A significant number of methods for face de-identification have been proposed in recent years. In this survey, we provide a comprehensive review of state-of-the-art face de-identification methods, categorized into three levels: pixel-level, representation-level, and semantic-level techniques. We systematically evaluate these methods based on two key criteria, the effectiveness of privacy protection and preservation of image utility, highlighting their advantages and limitations. Our analysis includes qualitative and quantitative comparisons of the main algorithms, demonstrating that deep learning-based approaches, particularly those using Generative Adversarial Networks (GANs) and diffusion models, have achieved significant advancements in balancing privacy and utility. Experimental results reveal that while recent methods demonstrate strong privacy protection, trade-offs remain in visual fidelity and computational complexity. This survey not only summarizes the current landscape but also identifies key challenges and future research directions in face de-identification.
Abstract:Large Language Models (LLMs) have achieved significant success in various natural language processing tasks, but the role of wireless networks in supporting LLMs has not been thoroughly explored. In this paper, we propose a wireless distributed Mixture of Experts (WDMoE) architecture to enable collaborative deployment of LLMs across edge servers at the base station (BS) and mobile devices in wireless networks. Specifically, we decompose the MoE layer in LLMs by placing the gating network and the preceding neural network layer at BS, while distributing the expert networks among the devices. This deployment leverages the parallel inference capabilities of expert networks on mobile devices, effectively utilizing the limited computing and caching resources of these devices. Accordingly, we develop a performance metric for WDMoE-based LLMs, which accounts for both model capability and latency. To minimize the latency while maintaining accuracy, we jointly optimize expert selection and bandwidth allocation based on the performance metric. Moreover, we build a hardware testbed using NVIDIA Jetson kits to validate the effectiveness of WDMoE. Both theoretical simulations and practical hardware experiments demonstrate that the proposed method can significantly reduce the latency without compromising LLM performance.
Abstract:Weakly-supervised methods typically guided the pixel-wise training by comparing the predictions to single-level labels containing diverse segmentation-related information at once, but struggled to represent delicate feature differences between nodule and background regions and confused incorrect information, resulting in underfitting or overfitting in the segmentation predictions. In this work, we propose a weakly-supervised network that generates multi-level labels from four-point annotation to refine diverse constraints for delicate nodule segmentation. The Distance-Similarity Fusion Prior referring to the points annotations filters out information irrelevant to nodules. The bounding box and pure foreground/background labels, generated from the point annotation, guarantee the rationality of the prediction in the arrangement of target localization and the spatial distribution of target/background regions, respectively. Our proposed network outperforms existing weakly-supervised methods on two public datasets with respect to the accuracy and robustness, improving the applicability of deep-learning based segmentation in the clinical practice of thyroid nodule diagnosis.
Abstract:Integrated sensing and communication (ISAC) is a very promising technology designed to provide both high rate communication capabilities and sensing capabilities. However, in Massive Multi User Multiple-Input Multiple-Output (Massive MU MIMO-ISAC) systems, the dense user access creates a serious multi-user interference (MUI) problem, leading to degradation of communication performance. To alleviate this problem, we propose a decentralized baseband processing (DBP) precoding method. We first model the MUI of dense user scenarios with minimizing Cramer-Rao bound (CRB) as an objective function.Hybrid precoding is an attractive ISAC technique, and hybrid precoding using Partially Connected Structures (PCS) can effectively reduce hardware cost and power consumption. We mitigate the MUI between dense users based on ThomlinsonHarashima Precoding (THP). We demonstrate the effectiveness of the proposed method through simulation experiments. Compared with the existing methods, it can effectively improve the communication data rates and energy efficiency in dense user access scenario, and reduce the hardware complexity of Massive MU MIMO-ISAC systems. The experimental results demonstrate the usefulness of our method for improving the MUI problem in ISAC systems for dense user access scenarios.