The Bayes coding algorithm for context tree source is a successful example of Bayesian tree estimation in text compression in information theory. This algorithm provides an efficient parametric representation of the posterior tree distribution and exact updating of its parameters. We apply this algorithm to a clustering task in machine learning. More specifically, we apply it to Bayesian estimation of the tree-structured stick-breaking process (TS-SBP) mixture models. For TS-SBP mixture models, only Markov chain Monte Carlo methods have been proposed so far, but any variational Bayesian methods have not been proposed yet. In this paper, we propose a variational Bayesian method that has a subroutine similar to the Bayes coding algorithm for context tree sources. We confirm its behavior by a numerical experiment on a toy example.