Millimeter-wave (mmWave) networks, integral to 5G communication, offer a vast spectrum that addresses the issue of spectrum scarcity and enhances peak rate and capacity. However, their dense deployment, necessary to counteract propagation losses, leads to high power consumption. An effective strategy to reduce this energy consumption in mobile networks is the sleep mode optimization (SMO) of base stations (BSs). In this paper, we propose a novel SMO approach for mmWave BSs in a 3D urban environment. This approach, which incorporates a neural network (NN) based contextual multi-armed bandit (C-MAB) with an epsilon decay algorithm, accommodates the dynamic and diverse traffic of user equipment (UE) by clustering the UEs in their respective tracking areas (TAs). Our strategy includes beamforming, which helps reduce energy consumption from the UE side, while SMO minimizes energy use from the BS perspective. We extended our investigation to include Random, Epsilon Greedy, Upper Confidence Bound (UCB), and Load Based sleep mode (SM) strategies. We compared the performance of our proposed C-MAB based SM algorithm with those of All On and other alternative approaches. Simulation results show that our proposed method outperforms all other SM strategies in terms of the $10^{th}$ percentile of user rate and average throughput while demonstrating comparable average throughput to the All On approach. Importantly, it outperforms all approaches in terms of energy efficiency (EE).