Abstract:Bioinformatics incorporates information regarding biological data storage, accessing mechanisms and presentation of characteristics within this data. Most of the problems in bioinformatics and be addressed efficiently by computer techniques. This paper aims at building a classifier based on Multiple Attractor Cellular Automata (MACA) which uses fuzzy logic with version Z to predict splicing site, protein coding and promoter region identification in eukaryotes. It is strengthened with an artificial immune system technique (AIS), Clonal algorithm for choosing rules of best fitness. The proposed classifier can handle DNA sequences of lengths 54,108,162,252,354. This classifier gives the exact boundaries of both protein and promoter regions with an average accuracy of 90.6%. This classifier can predict the splicing site with 97% accuracy. This classifier was tested with 1, 97,000 data components which were taken from Fickett & Toung , EPDnew, and other sequences from a renowned medical university.
Abstract:This paper aims at providing a survey on the problems that can be easily addressed by cellular automata in bioinformatics. Some of the authors have proposed algorithms for addressing some problems in bioinformatics but the application of cellular automata in bioinformatics is a virgin field in research. None of the researchers has tried to relate the major problems in bioinformatics and find a common solution. Extensive literature surveys were conducted. We have considered some papers in various journals and conferences for conduct of our research. This paper provides intuition towards relating various problems in bioinformatics logically and tries to attain a common frame work for addressing the same.
Abstract:Human body consists of lot of cells, each cell consist of DeOxaRibo Nucleic Acid (DNA). Identifying the genes from the DNA sequences is a very difficult task. But identifying the coding regions is more complex task compared to the former. Identifying the protein which occupy little place in genes is a really challenging issue. For understating the genes coding region analysis plays an important role. Proteins are molecules with macro structure that are responsible for a wide range of vital biochemical functions, which includes acting as oxygen, cell signaling, antibody production, nutrient transport and building up muscle fibers. Promoter region identification and protein structure prediction has gained a remarkable attention in recent years. Even though there are some identification techniques addressing this problem, the approximate accuracy in identifying the promoter region is closely 68% to 72%. We have developed a Cellular Automata based tool build with hybrid multiple attractor cellular automata (HMACA) classifier for protein coding region, promoter region identification and protein structure prediction which predicts the protein and promoter regions with an accuracy of 76%. This tool also predicts the structure of protein with an accuracy of 80%.
Abstract:Protein Structure Predication from sequences of amino acid has gained a remarkable attention in recent years. Even though there are some prediction techniques addressing this problem, the approximate accuracy in predicting the protein structure is closely 75%. An automated procedure was evolved with MACA (Multiple Attractor Cellular Automata) for predicting the structure of the protein. Most of the existing approaches are sequential which will classify the input into four major classes and these are designed for similar sequences. PSMACA is designed to identify ten classes from the sequences that share twilight zone similarity and identity with the training sequences. This method also predicts three states (helix, strand, and coil) for the structure. Our comprehensive design considers 10 feature selection methods and 4 classifiers to develop MACA (Multiple Attractor Cellular Automata) based classifiers that are build for each of the ten classes. We have tested the proposed classifier with twilight-zone and 1-high-similarity benchmark datasets with over three dozens of modern competing predictors shows that PSMACA provides the best overall accuracy that ranges between 77% and 88.7% depending on the dataset.