This paper presents a decentralized leader-follower multi-robot formation control based on a reinforcement learning (RL) algorithm applied to a swarm of small educational Sphero robots. Since the basic Q-learning method is known to require large memory resources for Q-tables, this work implements the Double Deep Q-Network (DDQN) algorithm, which has achieved excellent results in many robotic problems. To enhance the system behavior, we trained two different DDQN models, one for reaching the formation and the other for maintaining it. The models use a discrete set of robot motions (actions) to adapt the continuous nonlinear system to the discrete nature of RL. The presented approach has been tested in simulation and real experiments which show that the multi-robot system can achieve and maintain a stable formation without the need for complex mathematical models and nonlinear control laws.