Abstract:In this work we propose a novel method for supervised, keyshots based video summarization by applying a conceptually simple and computationally efficient soft, self-attention mechanism. Current state of the art methods leverage bi-directional recurrent networks such as BiLSTM combined with attention. These networks are complex to implement and computationally demanding compared to fully connected networks. To that end we propose a simple, self-attention based network for video summarization which performs the entire sequence to sequence transformation in a single feed forward pass and single backward pass during training. Our method sets a new state of the art results on two benchmarks TvSum and SumMe, commonly used in this domain.
Abstract:The goal of video segmentation is to turn video data into a set of concrete motion clusters that can be easily interpreted as building blocks of the video. There are some works on similar topics like detecting scene cuts in a video, but there is few specific research on clustering video data into the desired number of compact segments. It would be more intuitive, and more efficient, to work with perceptually meaningful entity obtained from a low-level grouping process which we call it superframe. This paper presents a new simple and efficient technique to detect superframes of similar content patterns in videos. We calculate the similarity of content-motion to obtain the strength of change between consecutive frames. With the help of existing optical flow technique using deep models, the proposed method is able to perform more accurate motion estimation efficiently. We also propose two criteria for measuring and comparing the performance of different algorithms on various databases. Experimental results on the videos from benchmark databases have demonstrated the effectiveness of the proposed method.
Abstract:In this paper we present the design and evaluation of an end-to-end trainable, deep neural network with a visual attention mechanism for memorability estimation in still images. We analyze the suitability of transfer learning of deep models from image classification to the memorability task. Further on we study the impact of the attention mechanism on the memorability estimation and evaluate our network on the SUN Memorability and the LaMem datasets. Our network outperforms the existing state of the art models on both datasets in terms of the Spearman's rank correlation as well as the mean squared error, closely matching human consistency.
Abstract:The increasing number of cameras and a handful of human operators to monitor the video inputs from hundreds of cameras leave the system ill equipped to fulfil the task of detecting anomalies. Thus, there is a dire need to automatically detect regions that require immediate attention for a more effective and proactive surveillance. We propose a framework that utilises the temporal variations in the flow field of a crowd scene to automatically detect salient regions, while eliminating the need to have prior knowledge of the scene or training. We deem the flow fields to be a dynamic system and adopt the stability theory of dynamical systems, to determine the motion dynamics within a given area. In the context of this work, salient regions refer to areas with high motion dynamics, where points in a particular region are unstable. Experimental results on public, crowd scenes have shown the effectiveness of the proposed method in detecting salient regions which correspond to unstable flow, occlusions, bottlenecks, entries and exits.
Abstract:Conventional tracking solutions are not feasible in handling abrupt motion as they are based on smooth motion assumption or an accurate motion model. Abrupt motion is not subject to motion continuity and smoothness. To assuage this, we deem tracking as an optimisation problem and propose a novel abrupt motion tracker that based on swarm intelligence - the SwaTrack. Unlike existing swarm-based filtering methods, we first of all introduce an optimised swarm-based sampling strategy to tradeoff between the exploration and exploitation of the search space in search for the optimal proposal distribution. Secondly, we propose Dynamic Acceleration Parameters (DAP) allow on the fly tuning of the best mean and variance of the distribution for sampling. Such innovating idea of combining these strategies in an ingenious way in the PSO framework to handle the abrupt motion, which so far no existing works are found. Experimental results in both quantitative and qualitative had shown the effectiveness of the proposed method in tracking abrupt motions.