Abstract:Visual speech recognition (VSR) is the task of recognizing spoken language from video input only, without any audio. VSR has many applications as an assistive technology, especially if it could be deployed in mobile devices and embedded systems. The need of intensive computational resources and large memory footprint are two of the major obstacles in developing neural network models for VSR in a resource constrained environment. We propose a novel end-to-end deep neural network architecture for word level VSR called MobiVSR with a design parameter that aids in balancing the model's accuracy and parameter count. We use depthwise-separable 3D convolution for the first time in the domain of VSR and show how it makes our model efficient. MobiVSR achieves an accuracy of 73\% on a challenging Lip Reading in the Wild dataset with 6 times fewer parameters and 20 times lesser memory footprint than the current state of the art. MobiVSR can also be compressed to 6 MB by applying post training quantization.
Abstract:Particle Filter(PF) is used extensively for estimation of target Non-linear and Non-gaussian state. However, its performance suffers due to inherent problem of sample degeneracy and impoverishment. In order to address this, we propose a novel resampling method based upon Crow Search Optimization to overcome low performing particles detected as outlier. Proposed outlier detection mechanism with transductive reliability achieve faster convergence of proposed PF tracking framework. In addition, we present an adaptive fuzzy fusion model to integrate multi-cue extracted for each evaluated particle. Automatic boosting and suppression of particles using proposed fusion model not only enhances performance of resampling method but also achieve optimal state estimation. Performance of the proposed tracker is evaluated over 12 benchmark video sequences and compared with state-of-the-art solutions. Qualitative and quantitative results reveals that the proposed tracker not only outperforms existing solutions but also efficiently handle various tracking challenges. On average of outcome, we achieve CLE of 7.98 and F-measure of 0.734.