Visible light communication (VLC) has been widely applied as a promising solution for modern short range communication. When it comes to the deployment of LED arrays in VLC networks, the emerging ultra-dense network (UDN) technology can be adopted to expand the VLC network's capacity. However, the problem of inter-cell interference (ICI) mitigation and efficient power control in the VLC-based UDN is still a critical challenge. To this end, a reinforcement learning (RL) based VLC UDN architecture is devised in this paper. The deployment of the cells is optimized via spatial reuse to mitigate ICI. An RL-based algorithm is proposed to dynamically optimize the policy of power and interference control, maximizing the system utility in the complicated and dynamic environment. Simulation results demonstrate the superiority of the proposed scheme, it increase the system utility and achievable data rate while reducing the energy consumption and ICI, which outperforms the benchmark scheme.