Massive multiple-input and multiple-output (MIMO) enables ultra-high throughput and low latency for tile-based adaptive virtual reality (VR) 360 video transmission in wireless network. In this paper, we consider a massive MIMO system where multiple users in a single-cell theater watch an identical VR 360 video. Based on tile prediction, base station (BS) deliveries the tiles in predicted field of view (FoV) to users. By introducing practical supplementary transmission for missing tiles and unacceptable VR sickness, we propose the first stable transmission scheme for VR video. we formulate an integer non-linear programming (INLP) problem to maximize users' average quality of experience (QoE) score. Moreover, we derive the achievable spectral efficiency (SE) expression of predictive tile groups and the approximately achievable SE expression of missing tile groups, respectively. Analytically, the overall throughput is related to the number of tile groups and the length of pilot sequences. By exploiting the relationship between the structure of viewport tiles and SE expression, we propose a multi-lattice multi-stream grouping method aimed at improving the overall throughput for VR video transmission. Moreover, we analyze the relationship between QoE objective and number of predictive tile. We transform the original INLP problem into an integer linear programming problem by setting the predictive tiles groups as some constants. With variable relaxation and recovery, we obtain the optimal average QoE. Extensive simulation results validate that the proposed algorithm effectively improves QoE.