The demand on the underwater communications is extremely increasing in searching for underwater resources, marine expedition, or environmental researches, yet there are many problems with the wireless communications because of the characteristics of the underwater environments. Especially, with the underwater wireless networks, there happen inevitable delay time and spacial inequality due to the distances between the nodes. To solve these problems, this paper suggests a new solution based on ALOHA-Q. The suggested method use random NAV value. and Environments take reward through communications success or fail. After then, The environments setting NAV value from reward. This model minimizes usage of energy and computing resources under the underwater wireless networks, and learns and setting NAV values through intense learning. The results of the simulations show that NAV values can be environmentally adopted and select best value to the circumstances, so the problems which are unnecessary delay times and spacial inequality can be solved. Result of simulations, NAV time decreasing 17.5% compared with original NAV.