Abstract:Videos can be an effective way to deliver contextualized, just-in-time medical information for patient education. However, video analysis, from topic identification and retrieval to extraction and analysis of medical information and understandability from a patient perspective are extremely challenging tasks. This study demonstrates a data analysis pipeline that utilizes methods to retrieve medical information from YouTube videos on preparing for a colonoscopy exam, a much maligned and disliked procedure that patients find challenging to get adequately prepared for. We first use the YouTube Data API to collect metadata of desired videos on select search keywords and use Google Video Intelligence API to analyze texts, frames and objects data. Then we annotate the YouTube video materials on medical information, video understandability and overall recommendation. We develop a bidirectional long short-term memory (BiLSTM) model to identify medical terms in videos and build three classifiers to group videos based on the levels of encoded medical information and video understandability, and whether the videos are recommended or not. Our study provides healthcare stakeholders with guidelines and a scalable approach for generating new educational video content to enhance management of a vast number of health conditions.