Abstract:In recent times, Transformer-based language models are making quite an impact in the field of natural language processing. As relevant parallels can be drawn between biological sequences and natural languages, the models used in NLP can be easily extended and adapted for various applications in bioinformatics. In this regard, this paper introduces the major developments of Transformer-based models in the recent past in the context of nucleotide sequences. We have reviewed and analysed a large number of application-based papers on this subject, giving evidence of the main characterizing features and to different approaches that may be adopted to customize such powerful computational machines. We have also provided a structured description of the functioning of Transformers, that may enable even first time users to grab the essence of such complex architectures. We believe this review will help the scientific community in understanding the various applications of Transformer-based language models to nucleotide sequences. This work will motivate the readers to build on these methodologies to tackle also various other problems in the field of bioinformatics.
Abstract:Prediction of binding sites for transcription factors is important to understand how they regulate gene expression and how this regulation can be modulated for therapeutic purposes. Although in the past few years there are significant works addressing this issue, there is still space for improvement. In this regard, a transformer based capsule network viz. DNABERT-Cap is proposed in this work to predict transcription factor binding sites mining ChIP-seq datasets. DNABERT-Cap is a bidirectional encoder pre-trained with large number of genomic DNA sequences, empowered with a capsule layer responsible for the final prediction. The proposed model builds a predictor for transcription factor binding sites using the joint optimisation of features encompassing both bidirectional encoder and capsule layer, along with convolutional and bidirectional long-short term memory layers. To evaluate the efficiency of the proposed approach, we use a benchmark ChIP-seq datasets of five cell lines viz. A549, GM12878, Hep-G2, H1-hESC and Hela, available in the ENCODE repository. The results show that the average area under the receiver operating characteristic curve score exceeds 0.91 for all such five cell lines. DNABERT-Cap is also compared with existing state-of-the-art deep learning based predictors viz. DeepARC, DeepTF, CNN-Zeng and DeepBind, and is seen to outperform them.
Abstract:Data gathered from multiple sensors can be effectively fused for accurate monitoring of many engineering applications. In the last few years, one of the most sought after applications for multi sensor fusion has been fault diagnosis. Dempster-Shafer Theory of Evidence along with Dempsters Combination Rule is a very popular method for multi sensor fusion which can be successfully applied to fault diagnosis. But if the information obtained from the different sensors shows high conflict, the classical Dempsters Combination Rule may produce counter-intuitive result. To overcome this shortcoming, this paper proposes an improved combination rule for multi sensor data fusion. Numerical examples have been put forward to show the effectiveness of the proposed method. Comparative analysis has also been carried out with existing methods to show the superiority of the proposed method in multi sensor fault diagnosis.
Abstract:Standard security protocols like SSL, TLS, IPSec etc. have high memory and processor consumption which makes all these security protocols unsuitable for resource constrained platforms such as Internet of Things (IoT). Blockchain (BC) finds its efficient application in IoT platform to preserve the five basic cryptographic primitives, such as confidentiality, authenticity, integrity, availability and non-repudiation. Conventional adoption of BC in IoT platform causes high energy consumption, delay and computational overhead which are not appropriate for various resource constrained IoT devices. This work proposes a machine learning (ML) based smart access control framework in a public and a private BC for a smart city application which makes it more efficient as compared to the existing IoT applications. The proposed IoT based smart city architecture adopts BC technology for preserving all the cryptographic security and privacy issues. Moreover, BC has very minimal overhead on IoT platform as well. This work investigates the existing threat models and critical access control issues which handle multiple permissions of various nodes and detects relevant inconsistencies to notify the corresponding nodes. Comparison in terms of all security issues with existing literature shows that the proposed architecture is competitively efficient in terms of security access control.
Abstract:Fault detection in sensor nodes is a pertinent issue that has been an important area of research for a very long time. But it is not explored much as yet in the context of Internet of Things. Internet of Things work with a massive amount of data so the responsibility for guaranteeing the accuracy of the data also lies with it. Moreover, a lot of important and critical decisions are made based on these data, so ensuring its correctness and accuracy is also very important. Also, the detection needs to be as precise as possible to avoid negative alerts. For this purpose, this work has adopted Dempster-Shafer Theory of Evidence which is a popular learning method to collate the information from sensors to come up with a decision regarding the faulty status of a sensor node. To verify the validity of the proposed method, simulations have been performed on a benchmark data set and data collected through a test bed in a laboratory set-up. For the different types of faults, the proposed method shows very competent accuracy for both the benchmark (99.8%) and laboratory data sets (99.9%) when compared to the other state-of-the-art machine learning techniques.