Advances in gradient-based inference have made distributional approximations for posterior distribution of latent-variable models easy, but only for continuous latent spaces. Models with discrete latent variables still require analytic marginalization, continuous relaxations, or specialized algorithms that are difficult to generalize already for minor variations of the model. Discrete normalizing flows could, in principle, be used as approximations while allowing efficient gradient-based learning, but as explained in this work they are not sufficiently expressive for representing realistic posterior distributions even for simple cases. We overcome this limitation by considering mixtures of discrete normalizing flows instead, and present a novel algorithm for modeling the posterior distribution of models with discrete latent variables, based on boosting variational inference.