Abstract:As a widely deployed security scheme, text-based CAPTCHAs have become more and more difficult to resist machine learning-based attacks. So far, many researchers have conducted attacking research on text-based CAPTCHAs deployed by different companies (such as Microsoft, Amazon, and Apple) and achieved certain results.However, most of these attacks have some shortcomings, such as poor portability of attack methods, requiring a series of data preprocessing steps, and relying on large amounts of labeled CAPTCHAs. In this paper, we propose an efficient and simple end-to-end attack method based on cycle-consistent generative adversarial networks. Compared with previous studies, our method greatly reduces the cost of data labeling. In addition, this method has high portability. It can attack common text-based CAPTCHA schemes only by modifying a few configuration parameters, which makes the attack easier. Firstly, we train CAPTCHA synthesizers based on the cycle-GAN to generate some fake samples. Basic recognizers based on the convolutional recurrent neural network are trained with the fake data. Subsequently, an active transfer learning method is employed to optimize the basic recognizer utilizing tiny amounts of labeled real-world CAPTCHA samples. Our approach efficiently cracked the CAPTCHA schemes deployed by 10 popular websites, indicating that our attack is likely very general. Additionally, we analyzed the current most popular anti-recognition mechanisms. The results show that the combination of more anti-recognition mechanisms can improve the security of CAPTCHA, but the improvement is limited. Conversely, generating more complex CAPTCHAs may cost more resources and reduce the availability of CAPTCHAs.