Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction

Nov 04, 2024

Cheng Tan, Zhenxiao Cao, Zhangyang Gao, Lirong Wu, Siyuan Li, Yufei Huang, Jun Xia, Bozhen Hu, Stan Z. Li

Figure 1 for MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction

Figure 2 for MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction

Figure 3 for MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction

Figure 4 for MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction

Share this with someone who'll enjoy it:

Abstract:Post-translational modifications (PTMs) profoundly expand the complexity and functionality of the proteome, regulating protein attributes and interactions that are crucial for biological processes. Accurately predicting PTM sites and their specific types is therefore essential for elucidating protein function and understanding disease mechanisms. Existing computational approaches predominantly focus on protein sequences to predict PTM sites, driven by the recognition of sequence-dependent motifs. However, these approaches often overlook protein structural contexts. In this work, we first compile a large-scale sequence-structure PTM dataset, which serves as the foundation for fair comparison. We introduce the MeToken model, which tokenizes the micro-environment of each amino acid, integrating both sequence and structural information into unified discrete tokens. This model not only captures the typical sequence motifs associated with PTMs but also leverages the spatial arrangements dictated by protein tertiary structures, thus providing a holistic view of the factors influencing PTM sites. Designed to address the long-tail distribution of PTM types, MeToken employs uniform sub-codebooks that ensure even the rarest PTMs are adequately represented and distinguished. We validate the effectiveness and generalizability of MeToken across multiple datasets, demonstrating its superior performance in accurately identifying PTM types. The results underscore the importance of incorporating structural data and highlight MeToken's potential in facilitating accurate and comprehensive PTM predictions, which could significantly impact proteomics research. The code and datasets are available at https://github.com/A4Bio/MeToken.

* 26 pages, 20 figures, 10 tables

View paper on

Share this with someone who'll enjoy it:

Title:MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction

Paper and Code