Visible light communication (VLC) is a promising solution to satisfy the extreme demands of emerging applications. VLC offers bandwidth that is orders of magnitude higher than what is offered by the radio spectrum, hence making best use of the resources is not a trivial matter. There is a growing interest to make next generation communication networks intelligent using AI based tools to automate the resource management and adapt to variations in the network automatically as opposed to conventional handcrafted schemes based on mathematical models assuming prior knowledge of the network. In this article, a reinforcement learning (RL) scheme is developed to intelligently allocate resources of an optical wireless communication (OWC) system in a HetNet environment. The main goal is to maximise the total reward of the system which is the sum rate of all users. The results of the RL scheme are compared with that of an optimization scheme that is based on Mixed Integer Linear Programming (MILP) model.