Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrew Starnes

Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks

Oct 09, 2023

Andrew Starnes, Anton Dereventsov, Clayton Webster

Figure 1 for Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks

Figure 2 for Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks

Figure 3 for Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks

Figure 4 for Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks

Abstract:In this effort, we consider the impact of regularization on the diversity of actions taken by policies generated from reinforcement learning agents trained using a policy gradient. Policy gradient agents are prone to entropy collapse, which means certain actions are seldomly, if ever, selected. We augment the optimization objective function for the policy with terms constructed from various $\varphi$-divergences and Maximum Mean Discrepancy which encourages current policies to follow different state visitation and/or action choice distribution than previously computed policies. We provide numerical experiments using MNIST, CIFAR10, and Spotify datasets. The results demonstrate the advantage of diversity-promoting policy regularization and that its use on gradient-based approaches have significantly improved performance on a variety of personalization tasks. Furthermore, numerical evidence is given to show that policy regularization increases performance without losing accuracy.

* 8 pages, 3 figures, accepted to WAIN 2023

Via

Access Paper or Ask Questions

Modeling Non-deterministic Human Behaviors in Discrete Food Choices

Jan 23, 2023

Andrew Starnes, Anton Dereventsov, E. Susanne Blazek, Folasade Phillips

Abstract:We establish a non-deterministic model that predicts a user's food preferences from their demographic information. Our simulator is based on NHANES dataset and domain expert knowledge in the form of established behavioral studies. Our model can be used to generate an arbitrary amount of synthetic datapoints that are similar in distribution to the original dataset and align with behavioral science expectations. Such a simulator can be used in a variety of machine learning tasks and especially in applications requiring human behavior prediction.

* 6 pages, 4 figures, published in 2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Via

Access Paper or Ask Questions

Examining Policy Entropy of Reinforcement Learning Agents for Personalization Tasks

Nov 21, 2022

Anton Dereventsov, Andrew Starnes, Clayton G. Webster

Abstract:This effort is focused on examining the behavior of reinforcement learning systems in personalization environments and detailing the differences in policy entropy associated with the type of learning algorithm utilized. We demonstrate that Policy Optimization agents often possess low-entropy policies during training, which in practice results in agents prioritizing certain actions and avoiding others. Conversely, we also show that Q-Learning agents are far less susceptible to such behavior and generally maintain high-entropy policies throughout training, which is often preferable in real-world applications. We provide a wide range of numerical experiments as well as theoretical justification to show that these differences in entropy are due to the type of learning being employed.

Via

Access Paper or Ask Questions