Hidden Markov Model (HMM) combined with Gaussian Process (GP) emission can be effectively used to estimate the hidden state with a sequence of complex input-output relational observations. Especially when the spectral mixture (SM) kernel is used for GP emission, we call this model as a hybrid HMM-GPSM. This model can effectively model the sequence of time-series data. However, because of a large number of parameters for the SM kernel, this model can not effectively be trained with a large volume of data having (1) long sequence for state transition and 2) a large number of time-series dataset in each sequence. This paper proposes a scalable learning method for HMM-GPSM. To effectively train the model with a long sequence, the proposed method employs a Stochastic Variational Inference (SVI) approach. Also, to effectively process a large number of data point each time-series data, we approximate the SM kernel using Reparametrized Random Fourier Feature (R-RFF). The combination of these two techniques significantly reduces the training time. We validate the proposed learning method in terms of its hidden-sate estimation accuracy and computation time using large-scale synthetic and real data sets with missing values.