We present an end-to-end computer vision system for mapping yield in an apple orchard using images captured from a single camera. Our proposed system is platform independent and does not require any specific lighting conditions. Our main technical contributions are 1)~a semi-supervised clustering algorithm that utilizes colors to identify apples and 2)~an unsupervised clustering method that utilizes spatial properties to estimate fruit counts from apple clusters having arbitrarily complex geometry. Additionally, we utilize camera motion to merge the counts across multiple views. We verified the performance of our algorithms by conducting multiple field trials on three tree rows consisting of $252$ trees at the University of Minnesota Horticultural Research Center. Results indicate that the detection method achieves $F_1$-measure $.95 -.97$ for multiple color varieties and lighting conditions. The counting method achieves an accuracy of $89\%-98\%$. Additionally, we report merged fruit counts from both sides of the tree rows. Our yield estimation method achieves an overall accuracy of $91.98\% - 94.81\%$ across different datasets.