Abstract:Bioinformatics incorporates information regarding biological data storage, accessing mechanisms and presentation of characteristics within this data. Most of the problems in bioinformatics and be addressed efficiently by computer techniques. This paper aims at building a classifier based on Multiple Attractor Cellular Automata (MACA) which uses fuzzy logic with version Z to predict splicing site, protein coding and promoter region identification in eukaryotes. It is strengthened with an artificial immune system technique (AIS), Clonal algorithm for choosing rules of best fitness. The proposed classifier can handle DNA sequences of lengths 54,108,162,252,354. This classifier gives the exact boundaries of both protein and promoter regions with an average accuracy of 90.6%. This classifier can predict the splicing site with 97% accuracy. This classifier was tested with 1, 97,000 data components which were taken from Fickett & Toung , EPDnew, and other sequences from a renowned medical university.
Abstract:This paper aims at providing a survey on the problems that can be easily addressed by cellular automata in bioinformatics. Some of the authors have proposed algorithms for addressing some problems in bioinformatics but the application of cellular automata in bioinformatics is a virgin field in research. None of the researchers has tried to relate the major problems in bioinformatics and find a common solution. Extensive literature surveys were conducted. We have considered some papers in various journals and conferences for conduct of our research. This paper provides intuition towards relating various problems in bioinformatics logically and tries to attain a common frame work for addressing the same.
Abstract:Most of the problems in bioinformatics are now the challenges in computing. This paper aims at building a classifier based on Multiple Attractor Cellular Automata (MACA) which uses fuzzy logic. It is strengthened with an artificial Immune System Technique (AIS), Clonal algorithm for identifying a protein coding and promoter region in a given DNA sequence. The proposed classifier is named as AIS-INMACA introduces a novel concept to combine CA with artificial immune system to produce a better classifier which can address major problems in bioinformatics. This will be the first integrated algorithm which can predict both promoter and protein coding regions. To obtain good fitness rules the basic concept of Clonal selection algorithm was used. The proposed classifier can handle DNA sequences of lengths 54,108,162,252,354. This classifier gives the exact boundaries of both protein and promoter regions with an average accuracy of 89.6%. This classifier was tested with 97,000 data components which were taken from Fickett & Toung, MPromDb, and other sequences from a renowned medical university. This proposed classifier can handle huge data sets and can find protein and promoter regions even in mixed and overlapped DNA sequences. This work also aims at identifying the logicality between the major problems in bioinformatics and tries to obtaining a common frame work for addressing major problems in bioinformatics like protein structure prediction, RNA structure prediction, predicting the splicing pattern of any primary transcript and analysis of information content in DNA, RNA, protein sequences and structure. This work will attract more researchers towards application of CA as a potential pattern classifier to many important problems in bioinformatics
Abstract:This paper exclusively reports the efficiency of AIS-INMACA. AIS-INMACA has created good impact on solving major problems in bioinformatics like protein region identification and promoter region prediction with less time (Pokkuluri Kiran Sree, 2014). This AIS-INMACA is now came with several variations (Pokkuluri Kiran Sree, 2014) towards projecting it as a tool in bioinformatics for solving many problems in bioinformatics. So this paper will be very much useful for so many researchers who are working in the domain of bioinformatics with cellular automata.
Abstract:Genes carry the instructions for making proteins that are found in a cell as a specific sequence of nucleotides that are found in DNA molecules. But, the regions of these genes that code for proteins may occupy only a small region of the sequence. Identifying the coding regions play a vital role in understanding these genes. In this paper we propose a unsupervised Fuzzy Multiple Attractor Cellular Automata (FMCA) based pattern classifier to identify the coding region of a DNA sequence. We propose a distinct K-Means algorithm for designing FMACA classifier which is simple, efficient and produces more accurate classifier than that has previously been obtained for a range of different sequence lengths. Experimental results confirm the scalability of the proposed Unsupervised FCA based classifier to handle large volume of datasets irrespective of the number of classes, tuples and attributes. Good classification accuracy has been established.
Abstract:Human body consists of lot of cells, each cell consist of DeOxaRibo Nucleic Acid (DNA). Identifying the genes from the DNA sequences is a very difficult task. But identifying the coding regions is more complex task compared to the former. Identifying the protein which occupy little place in genes is a really challenging issue. For understating the genes coding region analysis plays an important role. Proteins are molecules with macro structure that are responsible for a wide range of vital biochemical functions, which includes acting as oxygen, cell signaling, antibody production, nutrient transport and building up muscle fibers. Promoter region identification and protein structure prediction has gained a remarkable attention in recent years. Even though there are some identification techniques addressing this problem, the approximate accuracy in identifying the promoter region is closely 68% to 72%. We have developed a Cellular Automata based tool build with hybrid multiple attractor cellular automata (HMACA) classifier for protein coding region, promoter region identification and protein structure prediction which predicts the protein and promoter regions with an accuracy of 76%. This tool also predicts the structure of protein with an accuracy of 80%.
Abstract:Protein Structure Predication from sequences of amino acid has gained a remarkable attention in recent years. Even though there are some prediction techniques addressing this problem, the approximate accuracy in predicting the protein structure is closely 75%. An automated procedure was evolved with MACA (Multiple Attractor Cellular Automata) for predicting the structure of the protein. Most of the existing approaches are sequential which will classify the input into four major classes and these are designed for similar sequences. PSMACA is designed to identify ten classes from the sequences that share twilight zone similarity and identity with the training sequences. This method also predicts three states (helix, strand, and coil) for the structure. Our comprehensive design considers 10 feature selection methods and 4 classifiers to develop MACA (Multiple Attractor Cellular Automata) based classifiers that are build for each of the ten classes. We have tested the proposed classifier with twilight-zone and 1-high-similarity benchmark datasets with over three dozens of modern competing predictors shows that PSMACA provides the best overall accuracy that ranges between 77% and 88.7% depending on the dataset.
Abstract:CA has grown as potential classifier for addressing major problems in bioinformatics. Lot of bioinformatics problems like predicting the protein coding region, finding the promoter region, predicting the structure of protein and many other problems in bioinformatics can be addressed through Cellular Automata. Even though there are some prediction techniques addressing these problems, the approximate accuracy level is very less. An automated procedure was proposed with MACA (Multiple Attractor Cellular Automata) which can address all these problems. The genetic algorithm is also used to find rules with good fitness values. Extensive experiments are conducted for reporting the accuracy of the proposed tool. The average accuracy of MACA when tested with ENCODE, BG570, HMR195, Fickett and Tongue, ASP67 datasets is 78%.
Abstract:Artificial Immune System (AIS-MACA) a novel computational intelligence technique is can be used for strengthening the automated protein prediction system with more adaptability and incorporating more parallelism to the system. Most of the existing approaches are sequential which will classify the input into four major classes and these are designed for similar sequences. AIS-MACA is designed to identify ten classes from the sequences that share twilight zone similarity and identity with the training sequences with mixed and hybrid variations. This method also predicts three states (helix, strand, and coil) for the secondary structure. Our comprehensive design considers 10 feature selection methods and 4 classifiers to develop MACA (Multiple Attractor Cellular Automata) based classifiers that are build for each of the ten classes. We have tested the proposed classifier with twilight-zone and 1-high-similarity benchmark datasets with over three dozens of modern competing predictors shows that AIS-MACA provides the best overall accuracy that ranges between 80% and 89.8% depending on the dataset.