Multi-robot systems have become very popular in recent years because of their wide spectrum of applications, ranging from surveillance to cooperative payload transportation. Model Predictive Control (MPC) is a promising controller for multi-robot control because of its preview capability and ability to handle constraints easily. The performance of the MPC widely depends on many parameters, among which the prediction horizon is the major contributor. Increasing the prediction horizon beyond a limit drastically increases the computation cost. Tuning the value of the prediction horizon can be very time-consuming, and the tuning process must be repeated for every task. Moreover, instead of using a fixed horizon for an entire task, a better balance between performance and computation cost can be established if different prediction horizons can be employed for every robot at each time step. Further, for such variable prediction horizon MPC for multiple robots, on-demand collision avoidance is the key requirement. We propose Versatile On-demand Collision Avoidance (VODCA) strategy to comply with the variable horizon model predictive control. We also present a framework for learning the prediction horizon for the multi-robot system as a function of the states of the robots using the Soft Actor-Critic (SAC) RL algorithm. The results are illustrated and validated numerically for different multi-robot tasks.