Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hua Yan

LLM-Guided Reinforcement Learning: Addressing Training Bottlenecks through Policy Modulation

May 27, 2025

Heng Tan, Hua Yan, Yu Yang

Abstract:While reinforcement learning (RL) has achieved notable success in various domains, training effective policies for complex tasks remains challenging. Agents often converge to local optima and fail to maximize long-term rewards. Existing approaches to mitigate training bottlenecks typically fall into two categories: (i) Automated policy refinement, which identifies critical states from past trajectories to guide policy updates, but suffers from costly and uncertain model training; and (ii) Human-in-the-loop refinement, where human feedback is used to correct agent behavior, but this does not scale well to environments with large or continuous action spaces. In this work, we design a large language model-guided policy modulation framework that leverages LLMs to improve RL training without additional model training or human intervention. We first prompt an LLM to identify critical states from a sub-optimal agent's trajectories. Based on these states, the LLM then provides action suggestions and assigns implicit rewards to guide policy refinement. Experiments across standard RL benchmarks demonstrate that our method outperforms state-of-the-art baselines, highlighting the effectiveness of LLM-based explanations in addressing RL training bottlenecks.

Via

Access Paper or Ask Questions

What makes a good data augmentation for few-shot unsupervised image anomaly detection?

Apr 21, 2023

Lingrui Zhang, Shuheng Zhang, Guoyang Xie, Jiaqi Liu, Hua Yan, Jinbao Wang, Feng Zheng, Yaochu Jin

Abstract:Data augmentation is a promising technique for unsupervised anomaly detection in industrial applications, where the availability of positive samples is often limited due to factors such as commercial competition and sample collection difficulties. In this paper, how to effectively select and apply data augmentation methods for unsupervised anomaly detection is studied. The impact of various data augmentation methods on different anomaly detection algorithms is systematically investigated through experiments. The experimental results show that the performance of different industrial image anomaly detection (termed as IAD) algorithms is not significantly affected by the specific data augmentation method employed and that combining multiple data augmentation methods does not necessarily yield further improvements in the accuracy of anomaly detection, although it can achieve excellent results on specific methods. These findings provide useful guidance on selecting appropriate data augmentation methods for different requirements in IAD.

Via

Access Paper or Ask Questions

Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer

Sep 14, 2021

Fushun Zhu, Shan Zhao, Peng Wang, Hao Wang, Hua Yan, Shuaicheng Liu

Figure 1 for Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer

Figure 2 for Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer

Figure 3 for Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer

Figure 4 for Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer

Abstract:We propose a semi-supervised network for wide-angle portraits correction. Wide-angle images often suffer from skew and distortion affected by perspective distortion, especially noticeable at the face regions. Previous deep learning based approaches require the ground-truth correction flow maps for the training guidance. However, such labels are expensive, which can only be obtained manually. In this work, we propose a semi-supervised scheme, which can consume unlabeled data in addition to the labeled data for improvements. Specifically, our semi-supervised scheme takes the advantages of the consistency mechanism, with several novel components such as direction and range consistency (DRC) and regression consistency (RC). Furthermore, our network, named as Multi-Scale Swin-Unet (MS-Unet), is built upon the multi-scale swin transformer block (MSTB), which can learn both local-scale and long-range semantic information effectively. In addition, we introduce a high-quality unlabeled dataset with rich scenarios for the training. Extensive experiments demonstrate that the proposed method is superior over the state-of-the-art methods and other representative baselines.

* 13 pages, 10 figures

Via

Access Paper or Ask Questions

SASICM A Multi-Task Benchmark For Subtext Recognition

Jul 04, 2021

Hua Yan, Feng Han, Junyi An, Weikang Xiao, Jian Zhao, Furao Shen

Figure 1 for SASICM A Multi-Task Benchmark For Subtext Recognition

Figure 2 for SASICM A Multi-Task Benchmark For Subtext Recognition

Figure 3 for SASICM A Multi-Task Benchmark For Subtext Recognition

Figure 4 for SASICM A Multi-Task Benchmark For Subtext Recognition

Abstract:Subtext is a kind of deep semantics which can be acquired after one or more rounds of expression transformation. As a popular way of expressing one's intentions, it is well worth studying. In this paper, we try to make computers understand whether there is a subtext by means of machine learning. We build a Chinese dataset whose source data comes from the popular social media (e.g. Weibo, Netease Music, Zhihu, and Bilibili). In addition, we also build a baseline model called SASICM to deal with subtext recognition. The F1 score of SASICMg, whose pretrained model is GloVe, is as high as 64.37%, which is 3.97% higher than that of BERT based model, 12.7% higher than that of traditional methods on average, including support vector machine, logistic regression classifier, maximum entropy classifier, naive bayes classifier and decision tree and 2.39% higher than that of the state-of-the-art, including MARIN and BTM. The F1 score of SASICMBERT, whose pretrained model is BERT, is 65.12%, which is 0.75% higher than that of SASICMg. The accuracy rates of SASICMg and SASICMBERT are 71.16% and 70.76%, respectively, which can compete with those of other methods which are mentioned before.

* 34 pages, 6 figures, 6 tables. Submitted to the journal of artificial intelligence

Via

Access Paper or Ask Questions

Effective Neural Solution for Multi-Criteria Word Segmentation

Jan 04, 2018

Han He, Lei Wu, Hua Yan, Zhimin Gao, Yi Feng, George Townsend

Figure 1 for Effective Neural Solution for Multi-Criteria Word Segmentation

Figure 2 for Effective Neural Solution for Multi-Criteria Word Segmentation

Figure 3 for Effective Neural Solution for Multi-Criteria Word Segmentation

Figure 4 for Effective Neural Solution for Multi-Criteria Word Segmentation

Abstract:We present a simple yet elegant solution to train a single joint model on multi-criteria corpora for Chinese Word Segmentation (CWS). Our novel design requires no private layers in model architecture, instead, introduces two artificial tokens at the beginning and ending of input sentence to specify the required target criteria. The rest of the model including Long Short-Term Memory (LSTM) layer and Conditional Random Fields (CRFs) layer remains unchanged and is shared across all datasets, keeping the size of parameter collection minimal and constant. On Bakeoff 2005 and Bakeoff 2008 datasets, our innovative design has surpassed both single-criterion and multi-criteria state-of-the-art learning results. To the best knowledge, our design is the first one that has achieved the latest high performance on such large scale datasets. Source codes and corpora of this paper are available on GitHub.

* 2nd International Conference on Smart Computing & Informatics (SCI-2018), Springer Smart Innovation Systems and Technologies Book Series, Springer-Verlag, Accepted & Forthcoming, 2018

Via

Access Paper or Ask Questions

Dual Long Short-Term Memory Networks for Sub-Character Representation Learning

Jan 04, 2018

Han He, Lei Wu, Xiaokun Yang, Hua Yan, Zhimin Gao, Yi Feng, George Townsend

Figure 1 for Dual Long Short-Term Memory Networks for Sub-Character Representation Learning

Figure 2 for Dual Long Short-Term Memory Networks for Sub-Character Representation Learning

Figure 3 for Dual Long Short-Term Memory Networks for Sub-Character Representation Learning

Figure 4 for Dual Long Short-Term Memory Networks for Sub-Character Representation Learning

Abstract:Characters have commonly been regarded as the minimal processing unit in Natural Language Processing (NLP). But many non-latin languages have hieroglyphic writing systems, involving a big alphabet with thousands or millions of characters. Each character is composed of even smaller parts, which are often ignored by the previous work. In this paper, we propose a novel architecture employing two stacked Long Short-Term Memory Networks (LSTMs) to learn sub-character level representation and capture deeper level of semantic meanings. To build a concrete study and substantiate the efficiency of our neural architecture, we take Chinese Word Segmentation as a research case example. Among those languages, Chinese is a typical case, for which every character contains several components called radicals. Our networks employ a shared radical level embedding to solve both Simplified and Traditional Chinese Word Segmentation, without extra Traditional to Simplified Chinese conversion, in such a highly end-to-end way the word segmentation can be significantly simplified compared to the previous work. Radical level embeddings can also capture deeper semantic meaning below character level and improve the system performance of learning. By tying radical and character embeddings together, the parameter count is reduced whereas semantic knowledge is shared and transferred between two levels, boosting the performance largely. On 3 out of 4 Bakeoff 2005 datasets, our method surpassed state-of-the-art results by up to 0.4%. Our results are reproducible, source codes and corpora are available on GitHub.

* Accepted & forthcoming at ITNG-2018

Via

Access Paper or Ask Questions