Abstract:Recent advancements in deep reinforcement learning have brought forth an impressive display of highly skilled artificial agents capable of complex intelligent behavior. In video games, these artificial agents are increasingly deployed as non-playable characters (NPCs) designed to enhance the experience of human players. However, while it has been shown that the convincing human-like behavior of NPCs leads to increased engagement in video games, the believability of an artificial agent's behavior is most often measured solely by its proficiency at a given task. Recent work has hinted that proficiency alone is not sufficient to discern human-like behavior. Motivated by this, we build a non-parametric two-sample hypothesis test designed to compare the behaviors of artificial agents to those of human players. We show that the resulting $p$-value not only aligns with anonymous human judgment of human-like behavior, but also that it can be used as a measure of similarity.