Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yun Tang

Enhanced Hybrid Transducer and Attention Encoder Decoder with Text Data

Jun 23, 2025

Yun Tang, Eesung Kim, Vijendra Raj Apsingekar

Abstract:A joint speech and text optimization method is proposed for hybrid transducer and attention-based encoder decoder (TAED) modeling to leverage large amounts of text corpus and enhance ASR accuracy. The joint TAED (J-TAED) is trained with both speech and text input modalities together, while it only takes speech data as input during inference. The trained model can unify the internal representations from different modalities, and be further extended to text-based domain adaptation. It can effectively alleviate data scarcity for mismatch domain tasks since no speech data is required. Our experiments show J-TAED successfully integrates speech and linguistic information into one model, and reduce the WER by 5.8 ~12.8% on the Librispeech dataset. The model is also evaluated on two out-of-domain datasets: one is finance and another is named entity focused. The text-based domain adaptation brings 15.3% and 17.8% WER reduction on those two datasets respectively.

* Accepted by Interspeech2025

Via

Access Paper or Ask Questions

Building AI Service Repositories for On-Demand Service Orchestration in 6G AI-RAN

Apr 13, 2025

Yun Tang, Mengbang Zou, Udhaya Chandhar Srinivasan, Obumneme Umealor, Dennis Kevogo, Benjamin James Scott, Weisi Guo

Abstract:Efficient orchestration of AI services in 6G AI-RAN requires well-structured, ready-to-deploy AI service repositories combined with orchestration methods adaptive to diverse runtime contexts across radio access, edge, and cloud layers. Current literature lacks comprehensive frameworks for constructing such repositories and generally overlooks key practical orchestration factors. This paper systematically identifies and categorizes critical attributes influencing AI service orchestration in 6G networks and introduces an open-source, LLM-assisted toolchain that automates service packaging, deployment, and runtime profiling. We validate the proposed toolchain through the Cranfield AI Service repository case study, demonstrating significant automation benefits, reduced manual coding efforts, and the necessity of infrastructure-specific profiling, paving the way for more practical orchestration frameworks.

* 6 pages, three figures, one table, submitted to IEEE GlobeCOM 2025 for possible publication

Via

Access Paper or Ask Questions

Contextualized Autonomous Drone Navigation using LLMs Deployed in Edge-Cloud Computing

Apr 01, 2025

Hongqian Chen, Yun Tang, Antonios Tsourdos, Weisi Guo

Abstract:Autonomous navigation is usually trained offline in diverse scenarios and fine-tuned online subject to real-world experiences. However, the real world is dynamic and changeable, and many environmental encounters/effects are not accounted for in real-time due to difficulties in describing them within offline training data or hard to describe even in online scenarios. However, we know that the human operator can describe these dynamic environmental encounters through natural language, adding semantic context. The research is to deploy Large Language Models (LLMs) to perform real-time contextual code adjustment to autonomous navigation. The challenge not evaluated in literature is what LLMs are appropriate and where should these computationally heavy algorithms sit in the computation-communication edge-cloud computing architectures. In this paper, we evaluate how different LLMs can adjust both the navigation map parameters dynamically (e.g., contour map shaping) and also derive navigation task instruction sets. We then evaluate which LLMs are most suitable and where they should sit in future edge-cloud of 6G telecommunication architectures.

Via

Access Paper or Ask Questions

RAG-based User Profiling for Precision Planning in Mixed-precision Over-the-Air Federated Learning

Mar 19, 2025

Jinsheng Yuan, Yun Tang, Weisi Guo

Abstract:Mixed-precision computing, a widely applied technique in AI, offers a larger trade-off space between accuracy and efficiency. The recent purposed Mixed-Precision Over-the-Air Federated Learning (MP-OTA-FL) enables clients to operate at appropriate precision levels based on their heterogeneous hardware, taking advantages of the larger trade-off space while covering the quantization overheads in the mixed-precision modulation scheme for the OTA aggregation process. A key to further exploring the potential of the MP-OTA-FL framework is the optimization of client precision levels. The choice of precision level hinges on multifaceted factors including hardware capability, potential client contribution, and user satisfaction, among which factors can be difficult to define or quantify. In this paper, we propose a RAG-based User Profiling for precision planning framework that integrates retrieval-augmented LLMs and dynamic client profiling to optimize satisfaction and contributions. This includes a hybrid interface for gathering device/user insights and an RAG database storing historical quantization decisions with feedback. Experiments show that our method boosts satisfaction, energy savings, and global model accuracy in MP-OTA-FL systems.

* 5 pages, 4 figures, 2 tables, submitted to IEEE VTC 2025 fall for possible publication

Via

Access Paper or Ask Questions

End-to-End Edge AI Service Provisioning Framework in 6G ORAN

Mar 15, 2025

Yun Tang, Udhaya Chandhar Srinivasan, Benjamin James Scott, Obumneme Umealor, Dennis Kevogo, Weisi Guo

Abstract:With the advent of 6G, Open Radio Access Network (O-RAN) architectures are evolving to support intelligent, adaptive, and automated network orchestration. This paper proposes a novel Edge AI and Network Service Orchestration framework that leverages Large Language Model (LLM) agents deployed as O-RAN rApps. The proposed LLM-agent-powered system enables interactive and intuitive orchestration by translating the user's use case description into deployable AI services and corresponding network configurations. The LLM agent automates multiple tasks, including AI model selection from repositories (e.g., Hugging Face), service deployment, network adaptation, and real-time monitoring via xApps. We implement a prototype using open-source O-RAN projects (OpenAirInterface and FlexRIC) to demonstrate the feasibility and functionality of our framework. Our demonstration showcases the end-to-end flow of AI service orchestration, from user interaction to network adaptation, ensuring Quality of Service (QoS) compliance. This work highlights the potential of integrating LLM-driven automation into 6G O-RAN ecosystems, paving the way for more accessible and efficient edge AI ecosystems.

* 5 pages, 3 figures, submitted to IEEE VTC for possible publication

Via

Access Paper or Ask Questions

Automatic Item Generation for Personality Situational Judgment Tests with Large Language Models

Dec 10, 2024

Chang-Jin Li, Jiyuan Zhang, Yun Tang, Jian Li

Abstract:Personality assessment, particularly through situational judgment tests (SJTs), is a vital tool for psychological research, talent selection, and educational evaluation. This study explores the potential of GPT-4, a state-of-the-art large language model (LLM), to automate the generation of personality situational judgment tests (PSJTs) in Chinese. Traditional SJT development is labor-intensive and prone to biases, while GPT-4 offers a scalable, efficient alternative. Two studies were conducted: Study 1 evaluated the impact of prompt design and temperature settings on content validity, finding that optimized prompts with a temperature of 1.0 produced creative and accurate items. Study 2 assessed the psychometric properties of GPT-4-generated PSJTs, revealing that they demonstrated satisfactory reliability and validity, surpassing the performance of manually developed tests in measuring the Big Five personality traits. This research highlights GPT-4's effectiveness in developing high-quality PSJTs, providing a scalable and innovative method for psychometric test development. These findings expand the possibilities of automatic item generation and the application of LLMs in psychology, and offer practical implications for streamlining test development processes in resource-limited settings.

* Submitted to Organizational Research Methods. 48 pages (main text), 12 pages (appendix), and 3 figures

Via

Access Paper or Ask Questions

Transducer Consistency Regularization for Speech to Text Applications

Oct 09, 2024

Cindy Tseng, Yun Tang, Vijendra Raj Apsingekar

Figure 1 for Transducer Consistency Regularization for Speech to Text Applications

Figure 2 for Transducer Consistency Regularization for Speech to Text Applications

Figure 3 for Transducer Consistency Regularization for Speech to Text Applications

Figure 4 for Transducer Consistency Regularization for Speech to Text Applications

Abstract:Consistency regularization is a commonly used practice to encourage the model to generate consistent representation from distorted input features and improve model generalization. It shows significant improvement on various speech applications that are optimized with cross entropy criterion. However, it is not straightforward to apply consistency regularization for the transducer-based approaches, which are widely adopted for speech applications due to the competitive performance and streaming characteristic. The main challenge is from the vast alignment space of the transducer optimization criterion and not all the alignments within the space contribute to the model optimization equally. In this study, we present Transducer Consistency Regularization (TCR), a consistency regularization method for transducer models. We apply distortions such as spec augmentation and dropout to create different data views and minimize the distribution difference. We utilize occupational probabilities to give different weights on transducer output distributions, thus only alignments close to oracle alignments would contribute to the model learning. Our experiments show the proposed method is superior to other consistency regularization implementations and could effectively reduce word error rate (WER) by 4.3\% relatively comparing with a strong baseline on the \textsc{Librispeech} dataset.

* 8 pages, 4 figures. Accepted in IEEE Spoken Language Technology Workshop 2024

Via

Access Paper or Ask Questions

ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation

Feb 23, 2024

Yi Zhang, Yun Tang, Wenjie Ruan, Xiaowei Huang, Siddartha Khastgir, Paul Jennings, Xingyu Zhao

Abstract:Text-to-Image (T2I) Diffusion Models (DMs) have shown impressive abilities in generating high-quality images based on simple text descriptions. However, as is common with many Deep Learning (DL) models, DMs are subject to a lack of robustness. While there are attempts to evaluate the robustness of T2I DMs as a binary or worst-case problem, they cannot answer how robust in general the model is whenever an adversarial example (AE) can be found. In this study, we first introduce a probabilistic notion of T2I DMs' robustness; and then establish an efficient framework, ProTIP, to evaluate it with statistical guarantees. The main challenges stem from: i) the high computational cost of the generation process; and ii) determining if a perturbed input is an AE involves comparing two output distributions, which is fundamentally harder compared to other DL tasks like classification where an AE is identified upon misprediction of labels. To tackle the challenges, we employ sequential analysis with efficacy and futility early stopping rules in the statistical testing for identifying AEs, and adaptive concentration inequalities to dynamically determine the "just-right" number of stochastic perturbations whenever the verification target is met. Empirical experiments validate the effectiveness and efficiency of ProTIP over common T2I DMs. Finally, we demonstrate an application of ProTIP to rank commonly used defence methods.

Via

Access Paper or Ask Questions

Domain Knowledge Distillation from Large Language Model: An Empirical Study in the Autonomous Driving Domain

Jul 17, 2023

Yun Tang, Antonio A. Bruto da Costa, Jason Zhang, Irvine Patrick, Siddartha Khastgir, Paul Jennings

Figure 1 for Domain Knowledge Distillation from Large Language Model: An Empirical Study in the Autonomous Driving Domain

Figure 2 for Domain Knowledge Distillation from Large Language Model: An Empirical Study in the Autonomous Driving Domain

Figure 3 for Domain Knowledge Distillation from Large Language Model: An Empirical Study in the Autonomous Driving Domain

Figure 4 for Domain Knowledge Distillation from Large Language Model: An Empirical Study in the Autonomous Driving Domain

Abstract:Engineering knowledge-based (or expert) systems require extensive manual effort and domain knowledge. As Large Language Models (LLMs) are trained using an enormous amount of cross-domain knowledge, it becomes possible to automate such engineering processes. This paper presents an empirical automation and semi-automation framework for domain knowledge distillation using prompt engineering and the LLM ChatGPT. We assess the framework empirically in the autonomous driving domain and present our key observations. In our implementation, we construct the domain knowledge ontology by "chatting" with ChatGPT. The key finding is that while fully automated domain ontology construction is possible, human supervision and early intervention typically improve efficiency and output quality as they lessen the effects of response randomness and the butterfly effect. We, therefore, also develop a web-based distillation assistant enabling supervision and flexible intervention at runtime. We hope our findings and tools could inspire future research toward revolutionizing the engineering of knowledge-based systems across application domains.

* Accepted by ITSC 2023

Via

Access Paper or Ask Questions

Exploration on HuBERT with Multiple Resolutions

Jun 22, 2023

Jiatong Shi, Yun Tang, Hirofumi Inaguma, Hongyu GOng, Juan Pino, Shinji Watanabe

Figure 1 for Exploration on HuBERT with Multiple Resolutions

Figure 2 for Exploration on HuBERT with Multiple Resolutions

Figure 3 for Exploration on HuBERT with Multiple Resolutions

Figure 4 for Exploration on HuBERT with Multiple Resolutions

Abstract:Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL) model in speech processing. However, we argue that its fixed 20ms resolution for hidden representations would not be optimal for various speech-processing tasks since their attributes (e.g., speaker characteristics and semantics) are based on different time scales. To address this limitation, we propose utilizing HuBERT representations at multiple resolutions for downstream tasks. We explore two approaches, namely the parallel and hierarchical approaches, for integrating HuBERT features with different resolutions. Through experiments, we demonstrate that HuBERT with multiple resolutions outperforms the original model. This highlights the potential of utilizing multiple resolutions in SSL models like HuBERT to capture diverse information from speech signals.

* Accepted to Interspeech2023

Via

Access Paper or Ask Questions