The rapid advancement of quantum computing (QC) and machine learning (ML) has given rise to the burgeoning field of quantum machine learning (QML), aiming to capitalize on the strengths of quantum computing to propel ML forward. Despite its promise, crafting effective QML models necessitates profound expertise to strike a delicate balance between model intricacy and feasibility on Noisy Intermediate-Scale Quantum (NISQ) devices. While complex models offer robust representation capabilities, their extensive circuit depth may impede seamless execution on extant noisy quantum platforms. In this paper, we address this quandary of QML model design by employing deep reinforcement learning to explore proficient QML model architectures tailored for designated supervised learning tasks. Specifically, our methodology involves training an RL agent to devise policies that facilitate the discovery of QML models without predetermined ansatz. Furthermore, we integrate an adaptive mechanism to dynamically adjust the learning objectives, fostering continuous improvement in the agent's learning process. Through extensive numerical simulations, we illustrate the efficacy of our approach within the realm of classification tasks. Our proposed method successfully identifies VQC architectures capable of achieving high classification accuracy while minimizing gate depth. This pioneering approach not only advances the study of AI-driven quantum circuit design but also holds significant promise for enhancing performance in the NISQ era.