Reservoir computing is a form of machine learning that utilizes nonlinear dynamical systems to perform complex tasks in a cost-effective manner when compared to typical neural networks. Many recent advancements in reservoir computing, in particular quantum reservoir computing, make use of reservoirs that are inherently stochastic. However, the theoretical justification for using these systems has not yet been well established. In this paper, we investigate the universality of stochastic reservoir computers, in which we use a stochastic system for reservoir computing using the probabilities of each reservoir state as the readout instead of the states themselves. In stochastic reservoir computing, the number of distinct states of the entire reservoir computer can potentially scale exponentially with the size of the reservoir hardware, offering the advantage of compact device size. We prove that classes of stochastic echo state networks, and therefore the class of all stochastic reservoir computers, are universal approximating classes. We also investigate the performance of two practical examples of stochastic reservoir computers in classification and chaotic time series prediction. While shot noise is a limiting factor in the performance of stochastic reservoir computing, we show significantly improved performance compared to a deterministic reservoir computer with similar hardware in cases where the effects of noise are small.