Selection of hyperparameters in deep neural networks is a challenging problem due to the wide search space and emergence of various layers with specific hyperparameters. There exists an absence of consideration for the neural architecture selection of convolutional neural networks (CNNs) for spectrum sensing. Here, we develop a method using reinforcement learning and Q-learning to systematically search and evaluate various architectures for generated datasets including different signals and channels in the spectrum sensing problem. We show by extensive simulations that CNN-based detectors proposed by our developed method outperform several detectors in the literature. For the most complex dataset, the proposed approach provides 9% enhancement in accuracy at the cost of higher computational complexity. Furthermore, a novel method using multi-armed bandit model for selection of the sensing time is proposed to achieve higher throughput and accuracy while minimizing the consumed energy. The method dynamically adjusts the sensing time under the time-varying condition of the channel without prior information. We demonstrate through a simulated scenario that the proposed method improves the achieved reward by about 20% compared to the conventional policies. Consequently, this study effectively manages the selection of important hyperparameters for CNN-based detectors offering superior performance of cognitive radio network.