Abstract:The rapid expansion of video content across a variety of industries, including social media, education, entertainment, and surveillance, has made video summarization an essential field of study. The current work is a survey that explores the various approaches and methods created for video summarizing, emphasizing both abstractive and extractive strategies. The process of extractive summarization involves the identification of key frames or segments from the source video, utilizing methods such as shot boundary recognition, and clustering. On the other hand, abstractive summarization creates new content by getting the essential content from the video, using machine learning models like deep neural networks and natural language processing, reinforcement learning, attention mechanisms, generative adversarial networks, and multi-modal learning. We also include approaches that incorporate the two methodologies, along with discussing the uses and difficulties encountered in real-world implementations. The paper also covers the datasets used to benchmark these techniques. This review attempts to provide a state-of-the-art thorough knowledge of the current state and future directions of video summarization research.