Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rashid Khan

A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation

Oct 11, 2023

Rashid Khan, Bingding Huang, Haseeb Hassan, Asim Zaman, Zhongfu Ye

Abstract:Image captioning is a challenging task involving generating a textual description for an image using computer vision and natural language processing techniques. This paper proposes a deep neural framework for image caption generation using a GRU-based attention mechanism. Our approach employs multiple pre-trained convolutional neural networks as the encoder to extract features from the image and a GRU-based language model as the decoder to generate descriptive sentences. To improve performance, we integrate the Bahdanau attention model with the GRU decoder to enable learning to focus on specific image parts. We evaluate our approach using the MSCOCO and Flickr30k datasets and show that it achieves competitive scores compared to state-of-the-art methods. Our proposed framework can bridge the gap between computer vision and natural language and can be extended to specific domains.

* 15pages, 10 figures, 5 tables. 2023 the 5th International Conference on Robotics and Computer Vision (ICRCV 2023). arXiv admin note: substantial text overlap with arXiv:2203.01594

Via

Access Paper or Ask Questions

A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism

Mar 03, 2022

Rashid Khan, M Shujah Islam, Khadija Kanwal, Mansoor Iqbal, Md. Imran Hossain, Zhongfu Ye

Figure 1 for A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism

Figure 2 for A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism

Figure 3 for A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism

Figure 4 for A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism

Abstract:Image captioning is a fast-growing research field of computer vision and natural language processing that involves creating text explanations for images. This study aims to develop a system that uses a pre-trained convolutional neural network (CNN) to extract features from an image, integrates the features with an attention mechanism, and creates captions using a recurrent neural network (RNN). To encode an image into a feature vector as graphical attributes, we employed multiple pre-trained convolutional neural networks. Following that, a language model known as GRU is chosen as the decoder to construct the descriptive sentence. In order to increase performance, we merge the Bahdanau attention model with GRU to allow learning to be focused on a specific portion of the image. On the MSCOCO dataset, the experimental results achieve competitive performance against state-of-the-art approaches.

* Information Technology and Control 2022
* 16 PAGES, 8 figures, 1 TABLE

Via

Access Paper or Ask Questions

Context based Roman-Urdu to Urdu Script Transliteration System

Sep 29, 2021

H Muhammad Shakeel, Rashid Khan, Muhammad Waheed

Figure 1 for Context based Roman-Urdu to Urdu Script Transliteration System

Figure 2 for Context based Roman-Urdu to Urdu Script Transliteration System

Figure 3 for Context based Roman-Urdu to Urdu Script Transliteration System

Figure 4 for Context based Roman-Urdu to Urdu Script Transliteration System

Abstract:Now a day computer is necessary for human being and it is very useful in many fields like search engine, text processing, short messaging services, voice chatting and text recognition. Since last many years there are many tools and techniques that have been developed to support the writing of language script. Most of the Asian languages like Arabic, Urdu, Persian, Chains and Korean are written in Roman alphabets. Roman alphabets are the most commonly used for transliteration of languages, which have non-Latin scripts. For writing Urdu characters as an input, there are many layouts which are already exist. Mostly Urdu speaker prefer to use Roman-Urdu for different applications, because mostly user is not familiar with Urdu language keyboard. The objective of this work is to improve the context base transliteration of Roman-Urdu to Urdu script. In this paper, we propose an algorithm which effectively solve the transliteration issues. The algorithm work like, convert the encoding roman words into the words in the standard Urdu script and match it with the lexicon. If match found, then display the word in the text editor. The highest frequency words are displayed if more than one match found in the lexicon. Display the first encoded and converted instance and set it to the default if there is not a single instance of the match is found and then adjust the given ambiguous word to their desire location according to their context. The outcome of this algorithm proved the efficiency and significance as compare to other models and algorithms which work for transliteration of Raman-Urdu to Urdu on context.

Via

Access Paper or Ask Questions