In various game scenarios, selecting a fixed number of targets from multiple enemy units is an extremely challenging task. This difficulty stems from the complex relationship between the threat levels of enemy units and their feature characteristics, which complicates the design of rule-based evaluators. Moreover, traditional supervised learning methods face the challenge of lacking explicit labels during training when applied to this threat evaluation problem. In this study, we redefine the threat evaluation problem as a reinforcement learning task and introduce an efficient evaluator training algorithm, Eval-PPO, based on the Proximal Policy Optimization (PPO) algorithm. Eval-PPO integrates multidimensional enemy features and the state information of friendly units through systematic training, thereby achieving precise threat assessment. Compared with rule-based methods, Eval-PPO demonstrates a significant improvement in average success rate, with an increase of 17.84%.