Bayesian belief network learning algorithms have three basic components: a measure of a network structure and a database, a search heuristic that chooses network structures to be considered, and a method of estimating the probability tables from the database. This paper contributes to all these three topics. The behavior of the Bayesian measure of Cooper and Herskovits and a minimum description length (MDL) measure are compared with respect to their properties for both limiting size and finite size databases. It is shown that the MDL measure has more desirable properties than the Bayesian measure when a distribution is to be learned. It is shown that selecting belief networks with certain minimallity properties is NP-hard. This result justifies the use of search heuristics instead of exact algorithms for choosing network structures to be considered. In some cases, a collection of belief networks can be represented by a single belief network which leads to a new kind of probability table estimation called smoothing. We argue that smoothing can be efficiently implemented by incorporating it in the search heuristic. Experimental results suggest that for learning probabilities of belief networks smoothing is helpful.