We propose to utilize a variational autoencoder (VAE) for data-driven channel estimation. The underlying true and unknown channel distribution is modeled by the VAE as a conditional Gaussian distribution in a novel way, parameterized by the respective first and second order conditional moments. As a result, it can be observed that the linear minimum mean square error (LMMSE) estimator in its variant conditioned on the latent sample of the VAE approximates an optimal MSE estimator. Furthermore, we argue how a VAE-based channel estimator can approximate the MMSE channel estimator. We propose three variants of VAE estimators that differ in the data used during training and estimation. First, we show that given perfectly known channel state information at the input of the VAE during estimation, which is impractical, we obtain an estimator that can serve as a benchmark result for an estimation scenario. We then propose practically feasible approaches, where perfectly known channel state information is only necessary in the training phase or is not needed at all. Simulation results on 3GPP and QuaDRiGa channel data attest a small performance loss of the practical approaches and the superiority of our VAE approaches in comparison to other related channel estimation methods.