The beam squint effect, which manifests in different steering matrices in different sub-bands, has been widely considered a challenge in millimeter wave (mmWave) multiinput multi-output (MIMO) channel estimation. Existing methods either require specific forms of the precoding/combining matrix, which restrict their general practicality, or simply ignore the beam squint effect by only making use of a single sub-band for channel estimation. Recognizing that different steering matrices are coupled by the same set of unknown channel parameters, this paper proposes to exploit the common sparsity structure of the virtual channel model so that signals from different subbands can be jointly utilized to enhance the performance of channel estimation. A probabilistic model is built to induce the common sparsity in the spatial domain, and the first-order Taylor expansion is adopted to get rid of the grid mismatch in the dictionaries. To learn the model parameters, a variational expectation-maximization (EM) algorithm is derived, which automatically obtains the balance between the likelihood function and the common sparsity prior information, and is applicable to arbitrary forms of precoding/combining matrices. Simulation results show the superior estimation accuracy of the proposed algorithm over existing methods under different noise powers and system configurations.