We study the problem of learning the structure of an optimal Bayesian network $D$ when additional constraints are posed on the DAG $D$ or on its moralized graph. More precisely, we consider the constraint that the moralized graph can be transformed to a graph from a sparse graph class $\Pi$ by at most $k$ vertex deletions. We show that for $\Pi$ being the graphs with maximum degree $1$, an optimal network can be computed in polynomial time when $k$ is constant, extending previous work that gave an algorithm with such a running time for $\Pi$ being the class of edgeless graphs [Korhonen & Parviainen, NIPS 2015]. We then show that further extensions or improvements are presumably impossible. For example, we show that when $\Pi$ is the set of graphs with maximum degree $2$ or when $\Pi$ is the set of graphs in which each component has size at most three, then learning an optimal network is NP-hard even if $k=0$. Finally, we show that learning an optimal network with at most $k$ edges in the moralized graph presumably has no $f(k)\cdot |I|^{\mathcal{O}(1)}$-time algorithm and that, in contrast, an optimal network with at most $k$ arcs in the DAG $D$ can be computed in $2^{\mathcal{O}(k)}\cdot |I|^{\mathcal{O}(1)}$ time where $|I|$ is the total input size.