There is growing importance to detecting faults and implementing the best methods in industrial and real-world systems. We are searching for the most trustworthy and practical data-based fault detection methods proposed by artificial intelligence applications. In this paper, we propose a framework for fault detection based on reinforcement learning and a policy known as proximal policy optimization. As a result of the lack of fault data, one of the significant problems with the traditional policy is its weakness in detecting fault classes, which was addressed by changing the cost function. Using modified Proximal Policy Optimization, we can increase performance, overcome data imbalance, and better predict future faults. When our modified policy is implemented, all evaluation metrics will increase by $3\%$ to $4\%$ as compared to the traditional policy in the first benchmark, between $20\%$ and $55\%$ in the second benchmark, and between $6\%$ and $14\%$ in the third benchmark, as well as an improvement in performance and prediction speed compared to previous methods.