Abstract:The development of therapeutic targets for COVID-19 treatment is based on the understanding of the molecular mechanism of pathogenesis. The identification of genes and proteins involved in the infection mechanism is the key to shed out light into the complex molecular mechanisms. The combined effort of many laboratories distributed throughout the world has produced the accumulation of both protein and genetic interactions. In this work we integrate these available results and we obtain an host protein-protein interaction network composed by 1432 human proteins. We calculate network centrality measures to identify key proteins. Then we perform functional enrichment of central proteins. We observed that the identified proteins are mostly associated with several crucial pathways, including cellular process, signalling transduction, neurodegenerative disease. Finally, we focused on proteins involved in causing disease in the human respiratory tract. We conclude that COVID19 is a complex disease, and we highlighted many potential therapeutic targets including RBX1, HSPA5, ITCH, RAB7A, RAB5A, RAB8A, PSMC5, CAPZB, CANX, IGF2R, HSPA1A, which are central and also associated with multiple diseases
Abstract:Bioinformatics research is characterized by voluminous and incremental datasets and complex data analytics methods. The machine learning methods used in bioinformatics are iterative and parallel. These methods can be scaled to handle big data using the distributed and parallel computing technologies. Usually big data tools perform computation in batch-mode and are not optimized for iterative processing and high data dependency among operations. In the recent years, parallel, incremental, and multi-view machine learning algorithms have been proposed. Similarly, graph-based architectures and in-memory big data tools have been developed to minimize I/O cost and optimize iterative processing. However, there lack standard big data architectures and tools for many important bioinformatics problems, such as fast construction of co-expression and regulatory networks and salient module identification, detection of complexes over growing protein-protein interaction data, fast analysis of massive DNA, RNA, and protein sequence data, and fast querying on incremental and heterogeneous disease networks. This paper addresses the issues and challenges posed by several big data problems in bioinformatics, and gives an overview of the state of the art and the future research opportunities.