Deep generative models have shown great promise when it comes to synthesising novel images. While they can generate images that look convincing on a higher-level, generating fine-grained details is still a challenge. In order to foster research on more powerful generative approaches, this paper proposes a novel task: generative modelling of 2D tree skeletons. Trees are an interesting shape class because they exhibit complexity and variations that are well-suited to measure the ability of a generative model to generated detailed structures. We propose a new dataset for this task and demonstrate that state-of-the-art generative models fail to synthesise realistic images on our benchmark, even though they perform well on current datasets like MNIST digits. Motivated by these results, we propose a novel network architecture based on combining a variational autoencoder using Recurrent Neural Networks and a convolutional discriminator. The network, error metrics and training procedure are adapted to the task of fine-grained sketching. Through quantitative and perceptual experiments, we show that our model outperforms previous work and that our dataset is a valuable benchmark for generative models. We will make our dataset publicly available.