Estimating grape yield prior to harvest is important to commercial vineyard production as it informs many vineyard and winery decisions. Currently, the process of yield estimation is time consuming and varies in its accuracy from 75-90\% depending on the experience of the viticulturist. This paper proposes a multiple task learning (MTL) convolutional neural network (CNN) approach that uses images captured by inexpensive smart phones secured in a simple tripod arrangement. The CNN models use MTL transfer from autoencoders to achieve 85\% accuracy from image data captured 6 days prior to harvest.