We present the development and evaluation of a hand tracking algorithm based on single depth images captured from an overhead perspective for use in the COACH prompting system. We train a random decision forest body part classifier using approximately 5,000 manually labeled, unbalanced, partially labeled training images. The classifier represents a random subset of pixels in each depth image with a learned probability density function across all trained body parts. A local mode-find approach is used to search for clusters present in the underlying feature space sampled by the classified pixels. In each frame, body part positions are chosen as the mode with the highest confidence. User hand positions are translated into hand washing task actions based on proximity to environmental objects. We validate the performance of the classifier and task action proposals on a large set of approximately 24,000 manually labeled images.