We consider a Bayesian forecast aggregation model where $n$ experts, after observing private signals about an unknown binary event, report their posterior beliefs about the event to a principal, who then aggregates the reports into a single prediction for the event. The signals of the experts and the outcome of the event follow a joint distribution that is unknown to the principal, but the principal has access to i.i.d. "samples" from the distribution, where each sample is a tuple of experts' reports (not signals) and the realization of the event. Using these samples, the principal aims to find an $\varepsilon$-approximately optimal (Bayesian) aggregator. We study the sample complexity of this problem. We show that, for arbitrary discrete distributions, the number of samples must be at least $\tilde \Omega(m^{n-2} / \varepsilon)$, where $m$ is the size of each expert's signal space. This sample complexity grows exponentially in the number of experts $n$. But if experts' signals are independent conditioned on the realization of the event, then the sample complexity is significantly reduced, to $\tilde O(1 / \varepsilon^2)$, which does not depend on $n$.