In this work, a real-time hand gesture recognition system-based human-computer interface (HCI) is presented. The system consists of six stages: (1) hand detection, (2) gesture segmentation, (3) use of six pre-trained CNN models by using the transfer-learning method, (4) building an interactive human-machine interface, (5) development of a gesture-controlled virtual mouse, (6) use of Kalman filter to estimate the hand position, based on that the smoothness of the motion of pointer is improved. Six pre-trained convolutional neural network (CNN) models (VGG16, VGG19, ResNet50, ResNet101, Inception-V1, and MobileNet-V1) have been used to classify hand gesture images. Three multi-class datasets (two publicly and one custom) have been used to evaluate the model performances. Considering the models' performances, it has been observed that Inception-V1 has significantly shown a better classification performance compared to the other five pre-trained models in terms of accuracy, precision, recall, and F-score values. The gesture recognition system is expanded and used to control multimedia applications (like VLC player, audio player, file management, playing 2D Super-Mario-Bros game, etc.) with different customized gesture commands in real-time scenarios. The average speed of this system has reached 35 fps (frame per seconds), which meets the requirements for the real-time scenario.