Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sin-Yu Huang

Toward Agentic AI: Task-Oriented Communication for Hierarchical Planning of Long-Horizon Tasks

Jan 20, 2026

Sin-Yu Huang

Abstract:Agentic artificial intelligence (AI) is an AI paradigm that can perceive the environment, reason over observations, and execute actions to achieve specific goals. Task-oriented communication supports agentic AI by transmitting only the task-related information instead of full raw data in order to reduce the bandwidth requirement. In real-world scenarios, AI agents often need to perform a sequence of actions to complete complex tasks. Completing these long-horizon tasks requires a hierarchical agentic AI architecture, where a high-level planner module decomposes a task into subtasks, and a low-level actor module executes each subtask sequentially. Since each subtask has a distinct goal, the existing task-oriented communication schemes are not designed to handle different goals for different subtasks. To address this challenge, in this paper, we develop a hierarchical task-oriented communication (HiTOC) framework. We consider a system with an edge server and a robot as an edge device. The high-level planner and low-level actor modules reside on the edge server. The robot transmits only the environment information that is relevant to the current subtask in order to complete a long-horizon task. We propose a conditional variational information bottleneck (cVIB) approach to train the HiTOC framework to adaptively transmit minimal information required for each subtask. Simulations conducted on the AI2-THOR platform demonstrate that the proposed HiTOC framework outperforms three state-of-the-art schemes in terms of the success rate on MAP-THOR benchmark.

* Accepted by IEEE International Conference on Communications (ICC), Glasgow, UK, May 2026

Via

Access Paper or Ask Questions

Generalized User-Oriented Image Semantic Coding Empowered by Large Vision-Language Model

Sep 10, 2025

Sin-Yu Huang, Vincent W. S. Wong

Abstract:Semantic communication has shown outstanding performance in preserving the overall source information in wireless transmission. For semantically rich content such as images, human users are often interested in specific regions depending on their intent. Moreover, recent semantic coding models are mostly trained on specific datasets. However, real-world applications may involve images out of the distribution of training dataset, which makes generalization a crucial but largely unexplored problem. To incorporate user's intent into semantic coding, in this paper, we propose a generalized user-oriented image semantic coding (UO-ISC) framework, where the user provides a text query indicating its intent. The transmitter extracts features from the source image which are relevant to the user's query. The receiver reconstructs an image based on those features. To enhance the generalization ability, we integrate contrastive language image pre-training (CLIP) model, which is a pretrained large vision-language model (VLM), into our proposed UO-ISC framework. To evaluate the relevance between the reconstructed image and the user's query, we introduce the user-intent relevance loss, which is computed by using a pretrained large VLM, large language-and-vision assistant (LLaVA) model. When performing zero-shot inference on unseen objects, simulation results show that the proposed UO-ISC framework outperforms the state-of-the-art query-aware image semantic coding in terms of the answer match rate.

* Accepted by IEEE Global Communications Conference (GLOBECOM), Taipei, Taiwan, Dec. 2025

Via

Access Paper or Ask Questions

Leveraging MoE-based Large Language Model for Zero-Shot Multi-Task Semantic Communication

Mar 21, 2025

Sin-Yu Huang, Renjie Liao, Vincent W. S. Wong

Abstract:Multi-task semantic communication (SC) can reduce the computational resources in wireless systems since retraining is not required when switching between tasks. However, existing approaches typically rely on task-specific embeddings to identify the intended task, necessitating retraining the entire model when given a new task. Consequently, this drives the need for a multi-task SC system that can handle new tasks without additional training, known as zero-shot learning. Inspired by the superior zero-shot capabilities of large language models (LLMs), we leverage pre-trained instruction-tuned LLMs, referred to as fine-tuned language net (FLAN), to improve the generalization capability. We incorporate a mixture-of-experts (MoE) architecture in the FLAN model and propose MoE-FLAN-SC architecture for multi-task SC systems. Our proposed MoE-FLAN-SC architecture can further improve the performance of FLAN-T5 model without increasing the computational cost. Moreover, we design a multi-task feature extraction module (FEM) which can adaptively extract relevant features across various tasks given the provided features and signal-to-noise ratio (SNR). Simulation results show that our proposed MoE-FLAN-SC architecture outperforms three state-of-the-art models in terms of the average accuracy on four different unseen tasks.

* Accepted by IEEE International Conference on Communications (ICC), June 2025, Montreal, Canada

Via

Access Paper or Ask Questions