Spatial count data models are used to explain and predict the frequency of phenomena such as traffic accidents in geographically distinct entities such as census tracts or road segments. These models are typically estimated using Bayesian Markov chain Monte Carlo (MCMC) simulation methods, which, however, are computationally expensive and do not scale well to large datasets. Variational Bayes (VB), a method from machine learning, addresses the shortcomings of MCMC by casting Bayesian estimation as an optimisation problem instead of a simulation problem. In this paper, we derive a VB method for posterior inference in negative binomial models with unobserved parameter heterogeneity and spatial dependence. The proposed method uses Polya-Gamma augmentation to deal with the non-conjugacy of the negative binomial likelihood and an integrated non-factorised specification of the variational distribution to capture posterior dependencies. We demonstrate the benefits of the approach using simulated data and real data on youth pedestrian injury counts in the census tracts of New York City boroughs Bronx and Manhattan. The empirical analysis suggests that the VB approach is between 7 and 13 times faster than MCMC on a regular eight-core processor, while offering similar estimation and predictive accuracy. Conditional on the availability of computational resources, the embarrassingly parallel architecture of the proposed VB method can be exploited to further accelerate the estimation by up to 100 times.