The stochastic reaction network is widely used to model stochastic processes in physics, chemistry and biology. However, the size of the state space increases exponentially with the number of species, making it challenging to investigate the time evolution of the chemical master equation for the reaction network. Here, we propose a machine-learning approach using the variational autoregressive network to solve the chemical master equation. The approach is based on the reinforcement learning framework and does not require any data simulated in prior by another method. Different from simulating single trajectories, the proposed approach tracks the time evolution of the joint probability distribution in the state space of species counts, and supports direct sampling on configurations and computing their normalized joint probabilities. We apply the approach to various systems in physics and biology, and demonstrate that it accurately generates the probability distribution over time in the genetic toggle switch, the early life self-replicator, the epidemic model and the intracellular signaling cascade. The variational autoregressive network exhibits a plasticity in representing the multi-modal distribution by feedback regulations, cooperates with the conservation law, enables time-dependent reaction rates, and is efficient for high-dimensional reaction networks with allowing a flexible upper count limit. The results suggest a general approach to investigate stochastic reaction networks based on modern machine learning.