Abstract:While OCR has been used in various applications, its output is not always accurate, leading to misfit words. This research work focuses on improving the optical character recognition (OCR) with ML techniques with integration of OCR with long short-term memory (LSTM) based sequence to sequence deep learning models to perform document translation. This work is based on ANKI dataset for English to Spanish translation. In this work, I have shown comparative study for pre-trained OCR while using deep learning model using LSTM-based seq2seq architecture with attention for machine translation. End-to-end performance of the model has been expressed in BLEU-4 score. This research paper is aimed at researchers and practitioners interested in OCR and its applications in document translation.
Abstract:The recent success of the CLIP model has shown its potential to be applied to a wide range of vision and language tasks. However this only establishes embedding space relationship of language to images, not to the video domain. In this paper, we propose a novel approach to map video embedding space to natural langugage. We propose a two-stage approach that first extracts visual features from each frame of a video using a pre-trained CNN, and then uses the CLIP model to encode the visual features for the video domain, along with the corresponding text descriptions. We evaluate our method on two benchmark datasets, UCF101 and HMDB51, and achieve state-of-the-art performance on both tasks.
Abstract:Cyber attacks has always been of a great concern. Websites and services with poor security layers are the most vulnerable to such cyber attacks. The attackers can easily access sensitive data like credit card details and social security number from such vulnerable services. Currently to stop cyber attacks, various different methods are opted from using two-step verification methods like One-Time Password and push notification services to using high-end bio-metric devices like finger print reader and iris scanner are used as security layers. These current security measures carry a lot of cons and the worst is that user always need to carry the authentication device on them to access their data. To overcome this, we are proposing a technique of using keystroke dynamics (typing pattern) of a user to authenticate the genuine user. In the method, we are taking a data set of 51 users typing a password in 8 sessions done on alternate days to record mood fluctuations of the user. Developed and implemented anomaly-detection algorithm based on distance metrics and machine learning algorithms like Artificial Neural networks (ANN) and convolutional neural network (CNN) to classify the users. In ANN, we implemented multi-class classification using 1-D convolution as the data was correlated and multi-class classification with negative class which was used to classify anomaly based on all users put together. We were able to achieve an accuracy of 95.05% using ANN with Negative Class. From the results achieved, we can say that the model works perfectly and can be bought into the market as a security layer and a good alternative to two-step verification using external devices. This technique will enable users to have two-step security layer without worrying about carry an authentication device.
Abstract:This research paper focuses on the problem of dynamic objects and their impact on effective motion planning and localization. The paper proposes a two-step process to address this challenge, which involves finding the dynamic objects in the scene using a Flow-based method and then using a deep Video inpainting algorithm to remove them. The study aims to test the validity of this approach by comparing it with baseline results using two state-of-the-art SLAM algorithms, ORB-SLAM2 and LSD, and understanding the impact of dynamic objects and the corresponding trade-offs. The proposed approach does not require any significant modifications to the baseline SLAM algorithms, and therefore, the computational effort required remains unchanged. The paper presents a detailed analysis of the results obtained and concludes that the proposed method is effective in removing dynamic objects from the scene, leading to improved SLAM performance.