Abstract:Machine learning (ML) has demonstrated significant advancements in Android malware detection (AMD); however, the resilience of ML against realistic evasion attacks remains a major obstacle for AMD. One of the primary factors contributing to this challenge is the scarcity of reliable generalizations. Malware classifiers with limited generalizability tend to overfit spurious correlations derived from biased features. Consequently, adversarial examples (AEs), generated by evasion attacks, can modify these features to evade detection. In this study, we propose a domain adaptation technique to improve the generalizability of AMD by aligning the distribution of malware samples and AEs. Specifically, we utilize meaningful feature dependencies, reflecting domain constraints in the feature space, to establish a robust feature space. Training on the proposed robust feature space enables malware classifiers to learn from predefined patterns associated with app functionality rather than from individual features. This approach helps mitigate spurious correlations inherent in the initial feature space. Our experiments conducted on DREBIN, a renowned Android malware detector, demonstrate that our approach surpasses the state-of-the-art defense, Sec-SVM, when facing realistic evasion attacks. In particular, our defense can improve adversarial robustness by up to 55% against realistic evasion attacks compared to Sec-SVM.
Abstract:On the modern web, trackers and advertisers frequently construct and monetize users' detailed behavioral profiles without consent. Despite various studies on web tracking mechanisms and advertisements, there has been no rigorous study focusing on websites targeted at children. To address this gap, we present a measurement of tracking and (targeted) advertising on websites directed at children. Motivated by lacking a comprehensive list of child-directed (i.e., targeted at children) websites, we first build a multilingual classifier based on web page titles and descriptions. Applying this classifier to over two million pages, we compile a list of two thousand child-directed websites. Crawling these sites from five vantage points, we measure the prevalence of trackers, fingerprinting scripts, and advertisements. Our crawler detects ads displayed on child-directed websites and determines if ad targeting is enabled by scraping ad disclosure pages whenever available. Our results show that around 90% of child-directed websites embed one or more trackers, and about 27% contain targeted advertisements--a practice that should require verifiable parental consent. Next, we identify improper ads on child-directed websites by developing an ML pipeline that processes both images and text extracted from ads. The pipeline allows us to run semantic similarity queries for arbitrary search terms, revealing ads that promote services related to dating, weight loss, and mental health; as well as ads for sex toys and flirting chat services. Some of these ads feature repulsive and sexually explicit imagery. In summary, our findings indicate a trend of non-compliance with privacy regulations and troubling ad safety practices among many advertisers and child-directed websites. To protect children and create a safer online environment, regulators and stakeholders must adopt and enforce more stringent measures.
Abstract:Strengthening the robustness of machine learning-based malware detectors against realistic evasion attacks remains one of the major obstacles for Android malware detection. To that end, existing work has focused on interpreting domain constraints of Android malware in the problem space, where problem-space realizable adversarial examples are generated. In this paper, we provide another promising way to achieve the same goal but based on interpreting the domain constraints in the feature space, where feature-space realizable adversarial examples are generated. Specifically, we present a novel approach to extracting feature-space domain constraints by learning meaningful feature dependencies from data, and applying them based on a novel robust feature space. Experimental results successfully demonstrate the effectiveness of our novel robust feature space in providing adversarial robustness for DREBIN, a state-of-the-art Android malware detector. For example, it can decrease the evasion rate of a realistic gradient-based attack by $96.4\%$ in a limited-knowledge (transfer) setting and by $13.8\%$ in a more challenging, perfect-knowledge setting. In addition, we show that directly using our learned domain constraints in the adversarial retraining framework leads to about $84\%$ improvement in a limited-knowledge setting, with up to $377\times$ faster implementation than using problem-space adversarial examples.
Abstract:Over the last decade, several studies have investigated the weaknesses of Android malware detectors against adversarial examples by proposing novel evasion attacks; however, the practicality of most studies in manipulating real-world malware is arguable. The majority of studies have assumed attackers know the details of the target classifiers used for malware detection, while in real life, malicious actors have limited access to the target classifiers. This paper presents a practical evasion attack, EvadeDroid, to circumvent black-box Android malware detectors. In addition to generating real-world adversarial malware, the proposed evasion attack can preserve the functionality of the original malware samples. EvadeDroid applies a set of functionality-preserving transformations to morph malware instances into benign ones using an iterative and incremental manipulation strategy. The proposed manipulation technique is a novel, query-efficient optimization algorithm with the aim of finding and injecting optimal sequences of transformations into malware samples. Our empirical evaluation demonstrates the efficacy of EvadeDroid under hard- and soft-label attacks. Moreover, EvadeDroid is capable to generate practical adversarial examples with only a small number of queries, with evasion rate of 81%, 73%, and 75% for DREBIN, Sec-SVM, and MaMaDroid, respectively. Finally, we show that EvadeDroid is able to preserve its stealthiness against four popular commercial antivirus, thus demonstrating its feasibility in the real world.
Abstract:Android remains an attractive target for malware authors and as such, the mobile platform is still highly prone to infections caused by malicious applications. To tackle this problem, malware classifiers leveraging machine learning techniques have been proposed, with varying degrees of success. They often need to rely on a large, diverse set of features -- which are indicative of apps installed by users. This, in turn, raises privacy concerns as it has been shown that features used to train and test machine learning models can provide insights into user's preferences. As such, there is a need for a decentralized, privacy-respecting Android malware classifier which can protect users from both malware infections and the misuse of private, sensitive information stored on their mobile devices. To fill this gap, we propose LiM -- a malware classification framework which leverages the power of Federated Learning to detect and classify malicious apps in a privacy-respecting manner. Data about newly installed apps is kept locally on the users' devices while users benefit from the learning process from each other, and the service provider cannot infer which apps were installed by each user. To realize such classifier in a setting where users cannot provide ground truth (i.e. they cannot tell whether an app is malicious), we use a safe semi-supervised ensemble that maximizes the increase on classification accuracy with respect to a baseline classifier the service provider trains. We implement LiM and show that the cloud has F1 score of 95%, while clients have perfect recall with only 1 false positive in >100 apps, using a dataset of 25K clean apps and 25K malicious apps, 200 users and 50 rounds of federation. Furthermore, we also conducted a security analysis to demonstrate that LiM remains robust against poisoning attacks.