Abstract:With the exponential increase in video content, the need for accurate deception detection in human-centric video analysis has become paramount. This research focuses on the extraction and combination of various features to enhance the accuracy of deception detection models. By systematically extracting features from visual, audio, and text data, and experimenting with different combinations, we developed a robust model that achieved an impressive 99% accuracy. Our methodology emphasizes the significance of feature engineering in deception detection, providing a clear and interpretable framework. We trained various machine learning models, including LSTM, BiLSTM, and pre-trained CNNs, using both single and multi-modal approaches. The results demonstrated that combining multiple modalities significantly enhances detection performance compared to single modality training. This study highlights the potential of strategic feature extraction and combination in developing reliable and transparent automated deception detection systems in video analysis, paving the way for more advanced and accurate detection methodologies in future research.