Abstract:We present InstantSticker, a disentangled reconstruction pipeline based on Image-Based Lighting (IBL), which focuses on highly realistic decal blending, simulates stickers attached to the reconstructed surface, and allows for instant editing and real-time rendering. To achieve stereoscopic impression of the decal, we introduce shadow factor into IBL, which can be adaptively optimized during training. This allows the shadow brightness of surfaces to be accurately decomposed rather than baked into the diffuse color, ensuring that the edited texture exhibits authentic shading. To address the issues of warping and blurriness in previous methods, we apply As-Rigid-As-Possible (ARAP) parameterization to pre-unfold a specified area of the mesh and use the local UV mapping combined with a neural texture map to enhance the ability to express high-frequency details in that area. For instant editing, we utilize the Disney BRDF model, explicitly defining material colors with 3-channel diffuse albedo. This enables instant replacement of albedo RGB values during the editing process, avoiding the prolonged optimization required in previous approaches. In our experiment, we introduce the Ratio Variance Warping (RVW) metric to evaluate the local geometric warping of the decal area. Extensive experimental results demonstrate that our method surpasses previous decal blending methods in terms of editing quality, editing speed and rendering speed, achieving the state-of-the-art.
Abstract:This paper investigates joint device activity detection and channel estimation for grant-free random access in Low-earth orbit (LEO) satellite communications. We consider uplink communications from multiple single-antenna terrestrial users to a LEO satellite equipped with a uniform planar array of multiple antennas, where orthogonal frequency division multiplexing (OFDM) modulation is adopted. To combat the severe Doppler shift, a transmission scheme is proposed, where the discrete prolate spheroidal basis expansion model (DPS-BEM) is introduced to reduce the number of unknown channel parameters. Then the vector approximate message passing (VAMP) algorithm is employed to approximate the minimum mean square error estimation of the channel, and the Markov random field is combined to capture the channel sparsity. Meanwhile, the expectation-maximization (EM) approach is integrated to learn the hyperparameters in priors. Finally, active devices are detected by calculating energy of the estimated channel. Simulation results demonstrate that the proposed method outperforms conventional algorithms in terms of activity error rate and channel estimation precision.
Abstract:Existing works on machine learning (ML)-empowered wireless communication primarily focus on monolithic scenarios and single tasks. However, with the blooming growth of communication task classes coupled with various task requirements in future 6G systems, this working pattern is obviously unsustainable. Therefore, identifying a groundbreaking paradigm that enables a universal model to solve multiple tasks in the physical layer within diverse scenarios is crucial for future system evolution. This paper aims to fundamentally address the curse of ML model generalization across diverse scenarios and tasks by unleashing multi-modal feature integration capabilities in future systems. Given the universality of electromagnetic propagation theory, the communication process is determined by the scattering environment, which can be more comprehensively characterized by cross-modal perception, thus providing sufficient information for all communication tasks across varied environments. This fact motivates us to propose a transformative two-stage multi-modal pre-training and downstream task adaptation paradigm...
Abstract:Existing wireless video transmission schemes directly conduct video coding in pixel level, while neglecting the inner semantics contained in videos. In this paper, we propose a wireless video semantic communication framework, abbreviated as WVSC, which integrates the idea of semantic communication into wireless video transmission scenarios. WVSC first encodes original video frames as semantic frames and then conducts video coding based on such compact representations, enabling the video coding in semantic level rather than pixel level. Moreover, to further reduce the communication overhead, a reference semantic frame is introduced to substitute motion vectors of each frame in common video coding methods. At the receiver, multi-frame compensation (MFC) is proposed to produce compensated current semantic frame with a multi-frame fusion attention module. With both the reference frame transmission and MFC, the bandwidth efficiency improves with satisfying video transmission performance. Experimental results verify the performance gain of WVSC over other DL-based methods e.g. DVSC about 1 dB and traditional schemes about 2 dB in terms of PSNR.
Abstract:To achieve ubiquitous connectivity in next-generation networks through aerospace communications while maintaining high data rates, Terahertz (THz) band communications (0.1-10 THz) with large continuous bandwidths are considered a promising candidate technology. However, key enabling techniques and practical implementations of THz communications for aerospace applications remain limited. In this paper, the wireless channel characteristics, enabling communication techniques, and networking strategies for THz aerospace communications are investigated, aiming to assess their feasibility and encourage future research efforts toward system realization. Specifically, the wireless channel characteristics across various altitudes and scenarios are first analyzed, focusing on modeling the interaction between the THz wave and the external environment, from ground to outer space. Next, key enabling communication technologies, including multiple-input multiple-output (MIMO) technique, beam alignment and tracking, integrated communication and radar sensing (ICARS), and resource allocation for networking are discussed. Finally, the existing challenges and possible future directions are summarized and discussed.
Abstract:The transition from isolated systems to integrated solutions has driven the development of space-air-ground integrated networks (SAGIN) as well as the integration of communication and radar sensing functionalities. By leveraging the unique properties of the Terahertz (THz) band, THz joint communication and radar sensing (JCRS) supports high-speed communication and precise sensing, addressing the growing demands of SAGIN for connectivity and environmental awareness. However, most existing THz studies focus on terrestrial and static scenarios, with limited consideration for the dynamic and non-terrestrial environments of SAGIN. In this paper, the THz JCRS techniques for SAGIN are comprehensively investigated. Specifically, propagation characteristics and channel models of THz waves in non-terrestrial environments are analyzed. A link capacity comparison with millimeter-wave, THz, and free-space optical frequency bands is conducted to highlight the advantages of THz frequencies. Moreover, novel JCRS waveform design strategies are presented to achieve mutual merit of communication and radar sensing, while networking strategies are developed to overcome challenges in SAGIN such as high mobility. Furthermore, advancements in THz device technologies, including antennas and amplifiers, are reviewed to assess their roles in enabling practical JCRS implementations.
Abstract:Medical Phrase Grounding (MPG) maps radiological findings described in medical reports to specific regions in medical images. The primary obstacle hindering progress in MPG is the scarcity of annotated data available for training and validation. We propose anatomical grounding as an in-domain pre-training task that aligns anatomical terms with corresponding regions in medical images, leveraging large-scale datasets such as Chest ImaGenome. Our empirical evaluation on MS-CXR demonstrates that anatomical grounding pre-training significantly improves performance in both a zero-shot learning and fine-tuning setting, outperforming state-of-the-art MPG models. Our fine-tuned model achieved state-of-the-art performance on MS-CXR with an mIoU of 61.2, demonstrating the effectiveness of anatomical grounding pre-training for MPG.
Abstract:Recent advancements in deep learning have driven significant progress in lossless image compression. With the emergence of Large Language Models (LLMs), preliminary attempts have been made to leverage the extensive prior knowledge embedded in these pretrained models to enhance lossless image compression, particularly by improving the entropy model. However, a significant challenge remains in bridging the gap between the textual prior knowledge within LLMs and lossless image compression. To tackle this challenge and unlock the potential of LLMs, this paper introduces a novel paradigm for lossless image compression that incorporates LLMs with visual prompts. Specifically, we first generate a lossy reconstruction of the input image as visual prompts, from which we extract features to serve as visual embeddings for the LLM. The residual between the original image and the lossy reconstruction is then fed into the LLM along with these visual embeddings, enabling the LLM to function as an entropy model to predict the probability distribution of the residual. Extensive experiments on multiple benchmark datasets demonstrate our method achieves state-of-the-art compression performance, surpassing both traditional and learning-based lossless image codecs. Furthermore, our approach can be easily extended to images from other domains, such as medical and screen content images, achieving impressive performance. These results highlight the potential of LLMs for lossless image compression and may inspire further research in related directions.
Abstract:A latent denoising semantic communication (SemCom) framework is proposed for robust image transmission over noisy channels. By incorporating a learnable latent denoiser into the receiver, the received signals are preprocessed to effectively remove the channel noise and recover the semantic information, thereby enhancing the quality of the decoded images. Specifically, a latent denoising mapping is established by an iterative residual learning approach to improve the denoising efficiency while ensuring stable performance. Moreover, channel signal-to-noise ratio (SNR) is utilized to estimate and predict the latent similarity score (SS) for conditional denoising, where the number of denoising steps is adapted based on the predicted SS sequence, further reducing the communication latency. Finally, simulations demonstrate that the proposed framework can effectively and efficiently remove the channel noise at various levels and reconstruct visual-appealing images.
Abstract:This paper investigates an innovative movable antenna (MA)-enhanced multiple-input multiple-output (MIMO) system designed to enhance communication performance. We aim to maximize the energy efficiency (EE) under statistical channel state information (S-CSI) through a joint optimization of the transmit covariance matrix and the antenna position vectors (APVs). To solve the stochastic problem, we consider the large number of antennas scenario and resort to deterministic equivalent (DE) technology to reformulate the system EE w.r.t. the transmit variables, i.e., the transmit covariance matrix and APV, and the receive variables, i.e., the receive APV, respectively. Then, we propose an alternative optimization (AO) algorithm to update the transmit variables and the receive variables to maximize the system EE, respectively. Our numerical results reveal that, the proposed MA-enhanced system can significantly improve EE compared to several benchmark schemes and the optimal performance can be achieved with a finite size of movement regions for MAs.