Q-learning is widely employed for optimizing various large-dimensional networks with unknown system dynamics. Recent advancements include multi-environment mixed Q-learning (MEMQ) algorithms, which utilize multiple independent Q-learning algorithms across multiple, structurally related but distinct environments and outperform several state-of-the-art Q-learning algorithms in terms of accuracy, complexity, and robustness. We herein conduct a comprehensive probabilistic coverage analysis to ensure optimal data coverage conditions for MEMQ algorithms. First, we derive upper and lower bounds on the expectation and variance of different coverage coefficients (CC) for MEMQ algorithms. Leveraging these bounds, we develop a simple way of comparing the utilities of multiple environments in MEMQ algorithms. This approach appears to be near optimal versus our previously proposed partial ordering approach. We also present a novel CC-based MEMQ algorithm to improve the accuracy and complexity of existing MEMQ algorithms. Numerical experiments are conducted using random network graphs with four different graph properties. Our algorithm can reduce the average policy error (APE) by 65% compared to partial ordering and is 95% faster than the exhaustive search. It also achieves 60% less APE than several state-of-the-art reinforcement learning and prior MEMQ algorithms. Additionally, we numerically verify the theoretical results and show their scalability with the action-space size.