Joint utilization of multiple discrete frequency bands can enhance the accuracy of delay estimation. Although some unique challenges of multiband fusion, such as phase distortion, oscillation phenomena, and high-dimensional search, have been partially addressed, further challenges remain. Specifically, under conditions of low signal-to-noise ratio (SNR), insufficient data, and closely spaced delay paths, accurately determining the model order-the number of delay paths-becomes difficult. Misestimating the model order can significantly degrade the estimation performance of traditional methods. To address joint model selection and parameter estimation under such harsh conditions, we propose a multi-model stochastic particle-based variational Bayesian inference (MM-SPVBI) framework, capable of exploring multiple high-dimensional parameter spaces. Initially, we split potential overlapping primary delay paths based on coarse estimates, generating several parallel candidate models. Then, an auto-focusing sampling strategy is employed to quickly identify the optimal model. Additionally, we introduce a hybrid posterior approximation to improve the original single-model SPVBI, ensuring overall complexity does not increase significantly with parallelism. Simulations demonstrate that our algorithm offers substantial advantages over existing methods.