We propose using recognition networks for approximate inference inBayesian networks (BNs). A recognition network is a multilayerperception (MLP) trained to predict posterior marginals given observedevidence in a particular BN. The input to the MLP is a vector of thestates of the evidential nodes. The activity of an output unit isinterpreted as a prediction of the posterior marginal of thecorresponding variable. The MLP is trained using samples generated fromthe corresponding BN.We evaluate a recognition network that was trained to do inference ina large Bayesian network, similar in structure and complexity to theQuick Medical Reference, Decision Theoretic (QMR-DT). Our networkis a binary, two-layer, noisy-OR network containing over 4000 potentially observable nodes and over 600 unobservable, hidden nodes. Inreal medical diagnosis, most observables are unavailable, and there isa complex and unknown bias that selects which ones are provided. Weincorporate a very basic type of selection bias in our network: a knownpreference that available observables are positive rather than negative.Even this simple bias has a significant effect on the posterior. We compare the performance of our recognition network tostate-of-the-art approximate inference algorithms on a large set oftest cases. In order to evaluate the effect of our simplistic modelof the selection bias, we evaluate algorithms using a variety ofincorrectly modeled observation biases. Recognition networks performwell using both correct and incorrect observation biases.