The ability for an autonomous agent or robot to track and identify potentially multiple objects in a dynamic environment is essential for many applications, such as automated surveillance, traffic monitoring, human-robot interaction, etc. The main challenge is due to the noisy and incomplete perception including inevitable false negative and false positive errors from a low-level detector. In this paper, we propose a novel multi-object tracking and identification over sets approach to address this challenge. We define joint states and observations both as finite sets, and develop motion and observation functions accordingly. The object identification problem is then formulated and solved by using expectation-maximization methods. The set formulation enables us to avoid directly performing observation-to-object association. We empirically confirm that the overall algorithm outperforms the state-of-the-art in a popular PETS dataset.