One of the most important problems in computer vision and remote sensing is object detection, which identifies particular categories of diverse things in pictures. Two crucial data sources for public security are the thermal infrared (TIR) remote sensing multi-scenario photos and videos produced by unmanned aerial vehicles (UAVs). Due to the small scale of the target, complex scene information, low resolution relative to the viewable videos, and dearth of publicly available labeled datasets and training models, their object detection procedure is still difficult. A UAV TIR object detection framework for pictures and videos is suggested in this study. The Forward-looking Infrared (FLIR) cameras used to gather ground-based TIR photos and videos are used to create the ``You Only Look Once'' (YOLO) model, which is based on CNN architecture. Results indicated that in the validating task, detecting human object had an average precision at IOU (Intersection over Union) = 0.5, which was 72.5\%, using YOLOv7 (YOLO version 7) state of the art model \cite{1}, while the detection speed around 161 frames per second (FPS/second). The usefulness of the YOLO architecture is demonstrated in the application, which evaluates the cross-detection performance of people in UAV TIR videos under a YOLOv7 model in terms of the various UAVs' observation angles. The qualitative and quantitative evaluation of object detection from TIR pictures and videos using deep-learning models is supported favorably by this work.