Abstract:Malware detection is an ever-present challenge for all organizational gatekeepers. Organizations often deploy numerous different malware detection tools, and then combine their output to produce a final classification for an inspected file. This approach has two significant drawbacks. First, it requires large amounts of computing resources and time since every incoming file needs to be analyzed by all detectors. Secondly, it is difficult to accurately and dynamically enforce a predefined security policy that comports with the needs of each organization (e.g., how tolerant is the organization to false negatives and false positives). In this study we propose ASPIRE, a reinforcement learning (RL)-based method for malware detection. Our approach receives the organizational policy -- defined solely by the perceived costs of correct/incorrect classifications and of computing resources -- and then dynamically assigns detection tools and sets the detection threshold for each inspected file. We demonstrate the effectiveness and robustness of our approach by conducting an extensive evaluation on multiple organizational policies. ASPIRE performed well in all scenarios, even achieving near-optimal accuracy of 96.21% (compared to an optimum of 96.86%) at approximately 20% of the running time of this baseline.