Abstract:In addition to signature-based and heuristics-based detection techniques, machine learning (ML) is widely used to generalize to new, never-before-seen malicious software (malware). However, it has been demonstrated that ML models can be fooled by tricking the classifier into returning the incorrect label. These studies, for instance, usually rely on a prediction score that is fragile to gradient-based attacks. In the context of a more realistic situation where an attacker has very little information about the outputs of a malware detection engine, modest evasion rates are achieved. In this paper, we propose a method using reinforcement learning with DQN and REINFORCE algorithms to challenge two state-of-the-art ML-based detection engines (MalConv \& EMBER) and a commercial AV classified by Gartner as a leader AV. Our method combines several actions, modifying a Windows portable execution (PE) file without breaking its functionalities. Our method also identifies which actions perform better and compiles a detailed vulnerability report to help mitigate the evasion. We demonstrate that REINFORCE achieves very good evasion rates even on a commercial AV with limited available information.