Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jihong Park

Hybrid Semantic-Complementary Transmission for High-Fidelity Image Reconstruction

Jul 23, 2025

Hyelin Nam, Jihong Park, Jinho Choi, Seong-Lyun Kim

Abstract:Recent advances in semantic communication (SC) have introduced neural network (NN)-based transceivers that convey semantic representation (SR) of signals such as images. However, these NNs are trained over diverse image distributions and thus often fail to reconstruct fine-grained image-specific details. To overcome this limited reconstruction fidelity, we propose an extended SC framework, hybrid semantic communication (HSC), which supplements SR with complementary representation (CR) capturing residual image-specific information. The CR is constructed at the transmitter, and is combined with the actual SC outcome at the receiver to yield a high-fidelity recomposed image. While the transmission load of SR is fixed due to its NN-based structure, the load of CR can be flexibly adjusted to achieve a desirable fidelity. This controllability directly influences the final reconstruction error, for which we derive a closed-form expression and the corresponding optimal CR. Simulation results demonstrate that HSC substantially reduces MSE compared to the baseline SC without CR transmission across various channels and NN architectures.

Via

Access Paper or Ask Questions

Low-Complexity Semantic Packet Aggregation for Token Communication via Lookahead Search

Jun 24, 2025

Seunghun Lee, Jihong Park, Jinho Choi, Hyuncheol Park

Figure 1 for Low-Complexity Semantic Packet Aggregation for Token Communication via Lookahead Search

Figure 2 for Low-Complexity Semantic Packet Aggregation for Token Communication via Lookahead Search

Figure 3 for Low-Complexity Semantic Packet Aggregation for Token Communication via Lookahead Search

Figure 4 for Low-Complexity Semantic Packet Aggregation for Token Communication via Lookahead Search

Abstract:Tokens are fundamental processing units of generative AI (GenAI) and large language models (LLMs), and token communication (TC) is essential for enabling remote AI-generate content (AIGC) and wireless LLM applications. Unlike traditional bits, each of which is independently treated, the semantics of each token depends on its surrounding context tokens. This inter-token dependency makes TC vulnerable to outage channels, where the loss of a single token can significantly distort the original message semantics. Motivated by this, this paper focuses on optimizing token packetization to maximize the average token similarity (ATS) between the original and received token messages under outage channels. Due to inter-token dependency, this token grouping problem is combinatorial, with complexity growing exponentially with message length. To address this, we propose a novel framework of semantic packet aggregation with lookahead search (SemPA-Look), built on two core ideas. First, it introduces the residual semantic score (RSS) as a token-level surrogate for the message-level ATS, allowing robust semantic preservation even when a certain token packet is lost. Second, instead of full search, SemPA-Look applies a lookahead search-inspired algorithm that samples intra-packet token candidates without replacement (fixed depth), conditioned on inter-packet token candidates sampled with replacement (fixed width), thereby achieving linear complexity. Experiments on a remote AIGC task with the MS-COCO dataset (text captioned images) demonstrate that SemPA-Look achieves high ATS and LPIPS scores comparable to exhaustive search, while reducing computational complexity by up to 40$\times$. Compared to other linear-complexity algorithms such as the genetic algorithm (GA), SemPA-Look achieves 10$\times$ lower complexity, demonstrating its practicality for remote AIGC and other TC applications.

Via

Access Paper or Ask Questions

Directional Sparsity Based Statistical Channel Estimation for 6D Movable Antenna Communications

May 21, 2025

Xiaodan Shao, Rui Zhang, Jihong Park, Tony Q. S. Quek, Robert Schober, Xuemin Shen

Abstract:Six-dimensional movable antenna (6DMA) is an innovative and transformative technology to improve wireless network capacity by adjusting the 3D positions and 3D rotations of antennas/surfaces (sub-arrays) based on the channel spatial distribution. For optimization of the antenna positions and rotations, the acquisition of statistical channel state information (CSI) is essential for 6DMA systems. In this paper, we unveil for the first time a new \textbf{\textit{directional sparsity}} property of the 6DMA channels between the base station (BS) and the distributed users, where each user has significant channel gains only with a (small) subset of 6DMA position-rotation pairs, which can receive direct/reflected signals from the user. By exploiting this property, a covariance-based algorithm is proposed for estimating the statistical CSI in terms of the average channel power at a small number of 6DMA positions and rotations. Based on such limited channel power estimation, the average channel powers for all possible 6DMA positions and rotations in the BS movement region are reconstructed by further estimating the multi-path average power and direction-of-arrival (DOA) vectors of all users. Simulation results show that the proposed directional sparsity-based algorithm can achieve higher channel power estimation accuracy than existing benchmark schemes, while requiring a lower pilot overhead.

* arXiv admin note: substantial text overlap with arXiv:2409.16510; text overlap with arXiv:2503.18240

Via

Access Paper or Ask Questions

Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission

May 17, 2025

Seungeun Oh, Jinhyuk Kim, Jihong Park, Seung-Woo Ko, Jinho Choi, Tony Q. S. Quek, Seong-Lyun Kim

Abstract:To support emerging language-based applications using dispersed and heterogeneous computing resources, the hybrid language model (HLM) offers a promising architecture, where an on-device small language model (SLM) generates draft tokens that are validated and corrected by a remote large language model (LLM). However, the original HLM suffers from substantial communication overhead, as the LLM requires the SLM to upload the full vocabulary distribution for each token. Moreover, both communication and computation resources are wasted when the LLM validates tokens that are highly likely to be accepted. To overcome these limitations, we propose communication-efficient and uncertainty-aware HLM (CU-HLM). In CU-HLM, the SLM transmits truncated vocabulary distributions only when its output uncertainty is high. We validate the feasibility of this opportunistic transmission by discovering a strong correlation between SLM's uncertainty and LLM's rejection probability. Furthermore, we theoretically derive optimal uncertainty thresholds and optimal vocabulary truncation strategies. Simulation results show that, compared to standard HLM, CU-HLM achieves up to 206$\times$ higher token throughput by skipping 74.8% transmissions with 97.4% vocabulary compression, while maintaining 97.4% accuracy.

* 14 pages, 10 figures, 2 tables; This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection

May 05, 2025

Sungheon Jeong, Jihong Park, Mohsen Imani

Abstract:Most existing video anomaly detectors rely solely on RGB frames, which lack the temporal resolution needed to capture abrupt or transient motion cues, key indicators of anomalous events. To address this limitation, we propose Image-Event Fusion for Video Anomaly Detection (IEF-VAD), a framework that synthesizes event representations directly from RGB videos and fuses them with image features through a principled, uncertainty-aware process. The system (i) models heavy-tailed sensor noise with a Student`s-t likelihood, deriving value-level inverse-variance weights via a Laplace approximation; (ii) applies Kalman-style frame-wise updates to balance modalities over time; and (iii) iteratively refines the fused latent state to erase residual cross-modal noise. Without any dedicated event sensor or frame-level labels, IEF-VAD sets a new state of the art across multiple real-world anomaly detection benchmarks. These findings highlight the utility of synthetic event representations in emphasizing motion cues that are often underrepresented in RGB frames, enabling accurate and robust video understanding across diverse applications without requiring dedicated event sensors. Code and models are available at https://github.com/EavnJeong/IEF-VAD.

Via

Access Paper or Ask Questions

Enabling Training-Free Semantic Communication Systems with Generative Diffusion Models

May 05, 2025

Shunpu Tang, Yuanyuan Jia, Qianqian Yang, Ruichen Zhang, Jihong Park, Dusit Niyato

Figure 1 for Enabling Training-Free Semantic Communication Systems with Generative Diffusion Models

Figure 2 for Enabling Training-Free Semantic Communication Systems with Generative Diffusion Models

Figure 3 for Enabling Training-Free Semantic Communication Systems with Generative Diffusion Models

Figure 4 for Enabling Training-Free Semantic Communication Systems with Generative Diffusion Models

Abstract:Semantic communication (SemCom) has recently emerged as a promising paradigm for next-generation wireless systems. Empowered by advanced artificial intelligence (AI) technologies, SemCom has achieved significant improvements in transmission quality and efficiency. However, existing SemCom systems either rely on training over large datasets and specific channel conditions or suffer from performance degradation under channel noise when operating in a training-free manner. To address these issues, we explore the use of generative diffusion models (GDMs) as training-free SemCom systems. Specifically, we design a semantic encoding and decoding method based on the inversion and sampling process of the denoising diffusion implicit model (DDIM), which introduces a two-stage forward diffusion process, split between the transmitter and receiver to enhance robustness against channel noise. Moreover, we optimize sampling steps to compensate for the increased noise level caused by channel noise. We also conduct a brief analysis to provide insights about this design. Simulations on the Kodak dataset validate that the proposed system outperforms the existing baseline SemCom systems across various metrics.

Via

Access Paper or Ask Questions

Semantic Packet Aggregation for Token Communication via Genetic Beam Search

Apr 28, 2025

Seunghun Lee, Jihong Park, Jinho Choi, Hyuncheol Park

Abstract:Token communication (TC) is poised to play a pivotal role in emerging language-driven applications such as AI-generated content (AIGC) and wireless language models (LLMs). However, token loss caused by channel noise can severely degrade task performance. To address this, in this article, we focus on the problem of semantics-aware packetization and develop a novel algorithm, termed semantic packet aggregation with genetic beam search (SemPA-GBeam), which aims to maximize the average token similarity (ATS) over erasure channels. Inspired from the genetic algorithm (GA) and the beam search algorithm, SemPA-GBeam iteratively optimizes token grouping for packetization within a fixed number of groups (i.e., fixed beam width in beam search) while randomly swapping a fraction of tokens (i.e., mutation in GA). Experiments on the MS-COCO dataset demonstrate that SemPA-GBeam achieves ATS and LPIPS scores comparable to exhaustive search while reducing complexity by more than 20x.

Via

Access Paper or Ask Questions

Semantic Packet Aggregation and Repeated Transmission for Text-to-Image Generation

Mar 31, 2025

Seunghun Lee, Jihong Park, Jinho Choi, Hyuncheol Park

Figure 1 for Semantic Packet Aggregation and Repeated Transmission for Text-to-Image Generation

Figure 2 for Semantic Packet Aggregation and Repeated Transmission for Text-to-Image Generation

Figure 3 for Semantic Packet Aggregation and Repeated Transmission for Text-to-Image Generation

Figure 4 for Semantic Packet Aggregation and Repeated Transmission for Text-to-Image Generation

Abstract:Text-based communication is expected to be prevalent in 6G applications such as wireless AI-generated content (AIGC). Motivated by this, this paper addresses the challenges of transmitting text prompts over erasure channels for a text-to-image AIGC task by developing the semantic segmentation and repeated transmission (SMART) algorithm. SMART groups words in text prompts into packets, prioritizing the task-specific significance of semantics within these packets, and optimizes the number of repeated transmissions. Simulation results show that SMART achieves higher similarities in received texts and generated images compared to a character-level packetization baseline, while reducing computing latency by orders of magnitude compared to an exhaustive search baseline.

Via

Access Paper or Ask Questions

WVSC: Wireless Video Semantic Communication with Multi-frame Compensation

Mar 27, 2025

Bingyan Xie, Yongpeng Wu, Yuxuan Shi, Biqian Feng, Wenjun Zhang, Jihong Park, Tony Q. S. Quek

Abstract:Existing wireless video transmission schemes directly conduct video coding in pixel level, while neglecting the inner semantics contained in videos. In this paper, we propose a wireless video semantic communication framework, abbreviated as WVSC, which integrates the idea of semantic communication into wireless video transmission scenarios. WVSC first encodes original video frames as semantic frames and then conducts video coding based on such compact representations, enabling the video coding in semantic level rather than pixel level. Moreover, to further reduce the communication overhead, a reference semantic frame is introduced to substitute motion vectors of each frame in common video coding methods. At the receiver, multi-frame compensation (MFC) is proposed to produce compensated current semantic frame with a multi-frame fusion attention module. With both the reference frame transmission and MFC, the bandwidth efficiency improves with satisfying video transmission performance. Experimental results verify the performance gain of WVSC over other DL-based methods e.g. DVSC about 1 dB and traditional schemes about 2 dB in terms of PSNR.

Via

Access Paper or Ask Questions

ScNeuGM: Scalable Neural Graph Modeling for Coloring-Based Contention and Interference Management in Wi-Fi 7

Feb 05, 2025

Zhouyou Gu, Jihong Park, Jinho Choi

Figure 1 for ScNeuGM: Scalable Neural Graph Modeling for Coloring-Based Contention and Interference Management in Wi-Fi 7

Figure 2 for ScNeuGM: Scalable Neural Graph Modeling for Coloring-Based Contention and Interference Management in Wi-Fi 7

Figure 3 for ScNeuGM: Scalable Neural Graph Modeling for Coloring-Based Contention and Interference Management in Wi-Fi 7

Figure 4 for ScNeuGM: Scalable Neural Graph Modeling for Coloring-Based Contention and Interference Management in Wi-Fi 7

Abstract:Carrier-sense multiple access with collision avoidance in Wi-Fi often leads to contention and interference, thereby increasing packet losses. These challenges have traditionally been modeled as a graph, with stations (STAs) represented as vertices and contention or interference as edges. Graph coloring assigns orthogonal transmission slots to STAs, managing contention and interference, e.g., using the restricted target wake time (RTWT) mechanism introduced in Wi-Fi 7 standards. However, legacy graph models lack flexibility in optimizing these assignments, often failing to minimize slot usage while maintaining reliable transmissions. To address this issue, we propose ScNeuGM, a neural graph modeling (NGM) framework that flexibly trains a neural network (NN) to construct optimal graph models whose coloring corresponds to optimal slot assignments. ScNeuGM is highly scalable to large Wi-Fi networks with massive STA pairs: 1) it utilizes an evolution strategy (ES) to directly optimize the NN parameters based on one network-wise reward signal, avoiding exhaustive edge-wise feedback estimations in all STA pairs; 2) ScNeuGM also leverages a deep hashing function (DHF) to group contending or interfering STA pairs and restricts NGM NN training and inference to pairs within these groups, significantly reducing complexity. Simulations show that the ES-trained NN in ScNeuGM returns near-optimal graphs 4-10 times more often than algorithms requiring edge-wise feedback and reduces 25\% slots than legacy graph constructions. Furthermore, the DHF in ScNeuGM reduces the training and the inference time of NGM by 4 and 8 times, respectively, and the online slot assignment time by 3 times in large networks, and up to 30\% fewer packet losses in dynamic scenarios due to the timely assignments.

* This work has been submitted to an IEEE journal for possible publication

Via

Access Paper or Ask Questions