Modern recommender systems usually present items as one-dimensional ranking list. Recently there is a trend in e-commerce that the recommended items are organized as two-dimensional grid-based panels where users can view the items in both vertical and horizontal directions. Presenting items in grid-based result panels poses new challenges to recommender systems because existing models are all designed to output sequential lists while the slots in a grid-based panel have no explicit order. Directly converting the item rankings into grids (e.g., pre-defining an order on the slots)overlooks the user-specific behavioral patterns on grid-based pan-els and inevitably hurts the user experiences. To address this issue, we propose a novel Markov decision process (MDP) to place the items in 2D grid-based result panels at the final re-ranking stage of the recommender systems. The model, referred to as Panel-MDP, takes an initial item ranking from the early stages as the input. Then, it defines the MDP discrete time steps as the ranks in the initial ranking list, and the actions as the slots in the grid-based panels, plus a NULL action. At each time step, Panel-MDP sequentially takes an action of selecting one slot for placing an item of the initial ranking list, or discarding the item if NULL action is selected. The process is continued until all of the slots are filled. The reinforcement learning algorithm of DQN is employed to implement and learn the parameters in the Panel-MDP. Experiments on a dataset collected from a widely-used e-commerce app demonstrated the superiority ofPanel-MDP in terms of recommending 2D grid-based result panels.