Wearable computing and context awareness are the focuses of study in the field of artificial intelligence recently. One of the most appealing as well as challenging applications is the Human Activity Recognition (HAR) utilizing smart phones. Conventional HAR based on Support Vector Machine relies on subjective manually extracted features. This approach is time and energy consuming as well as immature in prediction due to the partial view toward which features to be extracted by human. With the rise of deep learning, artificial intelligence has been making progress toward being a mature technology. This paper proposes a new approach based on deep learning and traditional feature engineering called HAR-Net to address the issue related to HAR. The study used the data collected by gyroscopes and acceleration sensors in android smart phones. The raw sensor data was put into the HAR-Net proposed. The HAR-Net fusing the hand-crafted features and high-level features extracted from convolutional network to make prediction. The performance of the proposed method was proved to be 0.9% higher than the original MC-SVM approach. The experimental results on the UCI dataset demonstrate that fusing the two kinds of features can make up for the shortage of traditional feature engineering and deep learning techniques.