Visual Human Activity Recognition (HAR) and data fusion with other sensors can help us at tracking the behavior and activity of underground miners with little obstruction. Existing models, such as Single Shot Detector (SSD), trained on the Common Objects in Context (COCO) dataset is used in this paper to detect the current state of a miner, such as an injured miner vs a non-injured miner. Tensorflow is used for the abstraction layer of implementing machine learning algorithms, and although it uses Python to deal with nodes and tensors, the actual algorithms run on C++ libraries, providing a good balance between performance and speed of development. The paper further discusses evaluation methods for determining the accuracy of the machine-learning and an approach to increase the accuracy of the detected activity/state of people in a mining environment, by means of data fusion.