Abstract:One of the primary challenges of visual storytelling is developing techniques that can maintain the context of the story over long event sequences to generate human-like stories. In this paper, we propose a hierarchical deep learning architecture based on encoder-decoder networks to address this problem. To better help our network maintain this context while also generating long and diverse sentences, we incorporate natural language image descriptions along with the images themselves to generate each story sentence. We evaluate our system on the Visual Storytelling (VIST) dataset and show that our method outperforms state-of-the-art techniques on a suite of different automatic evaluation metrics. The empirical results from this evaluation demonstrate the necessities of different components of our proposed architecture and shows the effectiveness of the architecture for visual storytelling.
Abstract:Recently, a multi-level fuzzy min max neural network (MLF) was proposed, which improves the classification accuracy by handling an overlapped region (area of confusion) with the help of a tree structure. In this brief, an extension of MLF is proposed which defines a new boundary region, where the previously proposed methods mark decisions with less confidence and hence misclassification is more frequent. A methodology to classify patterns more accurately is presented. Our work enhances the testing procedure by means of data centroids. We exhibit an illustrative example, clearly highlighting the advantage of our approach. Results on standard datasets are also presented to evidentially prove a consistent improvement in the classification rate.