Since the 2016 US Presidential election, social media abuse has been eliciting massive concern in the academic community and beyond. Preventing and limiting the malicious activity of users, such as trolls and bots, in their manipulation campaigns is of paramount importance for the integrity of democracy, public health, and more. However, the automated detection of troll accounts is an open challenge. In this work, we propose an approach based on Inverse Reinforcement Learning (IRL) to capture troll behavior and identify troll accounts. We employ IRL to infer a set of online incentives that may steer user behavior, which in turn highlights behavioral differences between troll and non-troll accounts, enabling their accurate classification. We report promising results: the IRL-based approach is able to accurately detect troll accounts (AUC=89.1%). The differences in the predictive features between the two classes of accounts enables a principled understanding of the distinctive behaviors reflecting the incentives trolls and non-trolls respond to.