Abstract:Sex trafficking is a global epidemic. Escort websites are a primary vehicle for selling the services of such trafficking victims and thus a major driver of trafficker revenue. Many law enforcement agencies do not have the resources to manually identify leads from the millions of escort ads posted across dozens of public websites. We propose an ordinal regression neural network to identify escort ads that are likely linked to sex trafficking. Our model uses a modified cost function to mitigate inconsistencies in predictions often associated with nonparametric ordinal regression and leverages recent advancements in deep learning to improve prediction accuracy. The proposed method significantly improves on the previous state-of-the-art on Trafficking-10K, an expert-annotated dataset of escort ads. Additionally, because traffickers use acronyms, deliberate typographical errors, and emojis to replace explicit keywords, we demonstrate how to expand the lexicon of trafficking flags through word embeddings and t-SNE.
Abstract:In this paper, we show the convergence rates of posterior distributions of the model dynamics in a MDP for both episodic and continuous tasks. The theoretical results hold for general state and action space and the parameter space of the dynamics can be infinite dimensional. Moreover, we show the convergence rates of posterior distributions of the mean accumulative reward under a fixed or the optimal policy and of the regret bound. A variant of Thompson sampling algorithm is proposed which provides both posterior convergence rates for the dynamics and the regret-type bound. Then the previous results are extended to Markov games. Finally, we show numerical results with three simulation scenarios and conclude with discussions.
Abstract:We introduce Parameterized Exploration (PE), a simple family of methods for model-based tuning of the exploration schedule in sequential decision problems. Unlike common heuristics for exploration, our method accounts for the time horizon of the decision problem as well as the agent's current state of knowledge of the dynamics of the decision problem. We show our method as applied to several common exploration techniques has superior performance relative to un-tuned counterparts in Bernoulli and Gaussian multi-armed bandits, contextual bandits, and a Markov decision process based on a mobile health (mHealth) study. We also examine the effects of the accuracy of the estimated dynamics model on the performance of PE.