In many industrial processes, an apparent lack of data limits the development of data-driven soft sensors. There are, however, often opportunities to learn stronger models by being more data-efficient. To achieve this, one can leverage knowledge about the data from which the soft sensor is learned. Taking advantage of properties frequently possessed by industrial data, we introduce a deep latent variable model for semi-supervised multi-unit soft sensing. This hierarchical, generative model is able to jointly model different units, as well as learning from both labeled and unlabeled data. An empirical study of multi-unit soft sensing is conducted using two datasets: a synthetic dataset of single-phase fluid flow, and a large, real dataset of multi-phase flow in oil and gas wells. We show that by combining semi-supervised and multi-task learning, the proposed model achieves superior results, outperforming current leading methods for this soft sensing problem. We also show that when a model has been trained on a multi-unit dataset, it may be finetuned to previously unseen units using only a handful of data points. In this finetuning procedure, unlabeled data improve soft sensor performance; remarkably, this is true even when no labeled data are available.