Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaonan Zhang

Precise Antigen-Antibody Structure Predictions Enhance Antibody Development with HelixFold-Multimer

Dec 13, 2024

Jie Gao, Jing Hu, Lihang Liu, Yang Xue, Kunrui Zhu, Xiaonan Zhang, Xiaomin Fang

Figure 1 for Precise Antigen-Antibody Structure Predictions Enhance Antibody Development with HelixFold-Multimer

Figure 2 for Precise Antigen-Antibody Structure Predictions Enhance Antibody Development with HelixFold-Multimer

Figure 3 for Precise Antigen-Antibody Structure Predictions Enhance Antibody Development with HelixFold-Multimer

Figure 4 for Precise Antigen-Antibody Structure Predictions Enhance Antibody Development with HelixFold-Multimer

Abstract:The accurate prediction of antigen-antibody structures is essential for advancing immunology and therapeutic development, as it helps elucidate molecular interactions that underlie immune responses. Despite recent progress with deep learning models like AlphaFold and RoseTTAFold, accurately modeling antigen-antibody complexes remains a challenge due to their unique evolutionary characteristics. HelixFold-Multimer, a specialized model developed for this purpose, builds on the framework of AlphaFold-Multimer and demonstrates improved precision for antigen-antibody structures. HelixFold-Multimer not only surpasses other models in accuracy but also provides essential insights into antibody development, enabling more precise identification of binding sites, improved interaction prediction, and enhanced design of therapeutic antibodies. These advances underscore HelixFold-Multimer's potential in supporting antibody research and therapeutic innovation.

Via

Access Paper or Ask Questions

Technical Report of HelixFold3 for Biomolecular Structure Prediction

Aug 30, 2024

Lihang Liu, Shanzhuo Zhang, Yang Xue, Xianbin Ye, Kunrui Zhu, Yuxin Li, Yang Liu, Xiaonan Zhang, Xiaomin Fang

Figure 1 for Technical Report of HelixFold3 for Biomolecular Structure Prediction

Figure 2 for Technical Report of HelixFold3 for Biomolecular Structure Prediction

Figure 3 for Technical Report of HelixFold3 for Biomolecular Structure Prediction

Figure 4 for Technical Report of HelixFold3 for Biomolecular Structure Prediction

Abstract:The AlphaFold series has transformed protein structure prediction with remarkable accuracy, often matching experimental methods. AlphaFold2, AlphaFold-Multimer, and the latest AlphaFold3 represent significant strides in predicting single protein chains, protein complexes, and biomolecular structures. While AlphaFold2 and AlphaFold-Multimer are open-sourced, facilitating rapid and reliable predictions, AlphaFold3 remains partially accessible through a limited online server and has not been open-sourced, restricting further development. To address these challenges, the PaddleHelix team is developing HelixFold3, aiming to replicate AlphaFold3's capabilities. Using insights from previous models and extensive datasets, HelixFold3 achieves an accuracy comparable to AlphaFold3 in predicting the structures of conventional ligands, nucleic acids, and proteins. The initial release of HelixFold3 is available as open source on GitHub for academic research, promising to advance biomolecular research and accelerate discoveries. We also provide online service at PaddleHelix website at https://paddlehelix.baidu.com/app/all/helixfold3/forecast.

Via

Access Paper or Ask Questions

FedAR: Addressing Client Unavailability in Federated Learning with Local Update Approximation and Rectification

Jul 26, 2024

Chutian Jiang, Hansong Zhou, Xiaonan Zhang, Shayok Chakraborty

Figure 1 for FedAR: Addressing Client Unavailability in Federated Learning with Local Update Approximation and Rectification

Figure 2 for FedAR: Addressing Client Unavailability in Federated Learning with Local Update Approximation and Rectification

Figure 3 for FedAR: Addressing Client Unavailability in Federated Learning with Local Update Approximation and Rectification

Figure 4 for FedAR: Addressing Client Unavailability in Federated Learning with Local Update Approximation and Rectification

Abstract:Federated learning (FL) enables clients to collaboratively train machine learning models under the coordination of a server in a privacy-preserving manner. One of the main challenges in FL is that the server may not receive local updates from each client in each round due to client resource limitations and intermittent network connectivity. The existence of unavailable clients severely deteriorates the overall FL performance. In this paper, we propose , a novel client update Approximation and Rectification algorithm for FL to address the client unavailability issue. FedAR can get all clients involved in the global model update to achieve a high-quality global model on the server, which also furnishes accurate predictions for each client. To this end, the server uses the latest update from each client as a surrogate for its current update. It then assigns a different weight to each client's surrogate update to derive the global model, in order to guarantee contributions from both available and unavailable clients. Our theoretical analysis proves that FedAR achieves optimal convergence rates on non-IID datasets for both convex and non-convex smooth loss functions. Extensive empirical studies show that FedAR comprehensively outperforms state-of-the-art FL baselines including FedAvg, MIFA, FedVARP and Scaffold in terms of the training loss, test accuracy, and bias mitigation. Moreover, FedAR also depicts impressive performance in the presence of a large number of clients with severe client unavailability.

* 18 pages, ECML 2024

Via

Access Paper or Ask Questions

Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX

Jul 12, 2024

Zhiyuan Chen, Tianhao Chen, Chenggang Xie, Yang Xue, Xiaonan Zhang, Jingbo Zhou, Xiaomin Fang

Abstract:Proteins are fundamental components of biological systems and can be represented through various modalities, including sequences, structures, and textual descriptions. Despite the advances in deep learning and scientific large language models (LLMs) for protein research, current methodologies predominantly focus on limited specialized tasks -- often predicting one protein modality from another. These approaches restrict the understanding and generation of multimodal protein data. In contrast, large multimodal models have demonstrated potential capabilities in generating any-to-any content like text, images, and videos, thus enriching user interactions across various domains. Integrating these multimodal model technologies into protein research offers significant promise by potentially transforming how proteins are studied. To this end, we introduce HelixProtX, a system built upon the large multimodal model, aiming to offer a comprehensive solution to protein research by supporting any-to-any protein modality generation. Unlike existing methods, it allows for the transformation of any input protein modality into any desired protein modality. The experimental results affirm the advanced capabilities of HelixProtX, not only in generating functional descriptions from amino acid sequences but also in executing critical tasks such as designing protein sequences and structures from textual descriptions. Preliminary findings indicate that HelixProtX consistently achieves superior accuracy across a range of protein-related tasks, outperforming existing state-of-the-art models. By integrating multimodal large models into protein research, HelixProtX opens new avenues for understanding protein biology, thereby promising to accelerate scientific discovery.

Via

Access Paper or Ask Questions

HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights

Apr 16, 2024

Xiaomin Fang, Jie Gao, Jing Hu, Lihang Liu, Yang Xue, Xiaonan Zhang, Kunrui Zhu

Figure 1 for HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights

Figure 2 for HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights

Figure 3 for HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights

Figure 4 for HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights

Abstract:While monomer protein structure prediction tools boast impressive accuracy, the prediction of protein complex structures remains a daunting challenge in the field. This challenge is particularly pronounced in scenarios involving complexes with protein chains from different species, such as antigen-antibody interactions, where accuracy often falls short. Limited by the accuracy of complex prediction, tasks based on precise protein-protein interaction analysis also face obstacles. In this report, we highlight the ongoing advancements of our protein complex structure prediction model, HelixFold-Multimer, underscoring its enhanced performance. HelixFold-Multimer provides precise predictions for diverse protein complex structures, especially in therapeutic protein interactions. Notably, HelixFold-Multimer achieves remarkable success in antigen-antibody and peptide-protein structure prediction, surpassing AlphaFold-Multimer by several folds. HelixFold-Multimer is now available for public use on the PaddleHelix platform, offering both a general version and an antigen-antibody version. Researchers can conveniently access and utilize this service for their development needs.

Via

Access Paper or Ask Questions

Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models

Oct 21, 2023

Lihang Liu, Donglong He, Xianbin Ye, Shanzhuo Zhang, Xiaonan Zhang, Jingbo Zhou, Jun Li, Hua Chai, Fan Wang, Jingzhou He(+3 more)

Abstract:Molecular docking, a pivotal computational tool for drug discovery, predicts the binding interactions between small molecules (ligands) and target proteins (receptors). Conventional physics-based docking tools, though widely used, face limitations in precision due to restricted conformational sampling and imprecise scoring functions. Recent endeavors have employed deep learning techniques to enhance docking accuracy, but their generalization remains a concern due to limited training data. Leveraging the success of extensive and diverse data in other domains, we introduce HelixDock, a novel approach for site-specific molecular docking. Hundreds of millions of binding poses are generated by traditional docking tools, encompassing diverse protein targets and small molecules. Our deep learning-based docking model, a SE(3)-equivariant network, is pre-trained with this large-scale dataset and then fine-tuned with a small number of precise receptor-ligand complex structures. Comparative analyses against physics-based and deep learning-based baseline methods highlight HelixDock's superiority, especially on challenging test sets. Our study elucidates the scaling laws of the pre-trained molecular docking models, showcasing consistent improvements with increased model parameters and pre-train data quantities. Harnessing the power of extensive and diverse generated data holds promise for advancing AI-driven drug discovery.

Via

Access Paper or Ask Questions

HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative

Aug 09, 2022

Xiaomin Fang, Fan Wang, Lihang Liu, Jingzhou He, Dayong Lin, Yingfei Xiang, Xiaonan Zhang, Hua Wu, Hui Li, Le Song

Figure 1 for HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative

Figure 2 for HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative

Figure 3 for HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative

Figure 4 for HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative

Abstract:AI-based protein structure prediction pipelines, such as AlphaFold2, have achieved near-experimental accuracy. These advanced pipelines mainly rely on Multiple Sequence Alignments (MSAs) as inputs to learn the co-evolution information from the homologous sequences. Nonetheless, searching MSAs from protein databases is time-consuming, usually taking dozens of minutes. Consequently, we attempt to explore the limits of fast protein structure prediction by using only primary sequences of proteins. HelixFold-Single is proposed to combine a large-scale protein language model with the superior geometric learning capability of AlphaFold2. Our proposed method, HelixFold-Single, first pre-trains a large-scale protein language model (PLM) with thousands of millions of primary sequences utilizing the self-supervised learning paradigm, which will be used as an alternative to MSAs for learning the co-evolution information. Then, by combining the pre-trained PLM and the essential components of AlphaFold2, we obtain an end-to-end differentiable model to predict the 3D coordinates of atoms from only the primary sequence. HelixFold-Single is validated in datasets CASP14 and CAMEO, achieving competitive accuracy with the MSA-based methods on the targets with large homologous families. Furthermore, HelixFold-Single consumes much less time than the mainstream pipelines for protein structure prediction, demonstrating its potential in tasks requiring many predictions. The code of HelixFold-Single is available at https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold-single, and we also provide stable web services on https://paddlehelix.baidu.com/app/drug/protein-single/forecast.

Via

Access Paper or Ask Questions

TCR: A Transformer Based Deep Network for Predicting Cancer Drugs Response

Jul 10, 2022

Jie Gao, Jing Hu, Wanqing Sun, Yili Shen, Xiaonan Zhang, Xiaomin Fang, Fan Wang, Guodong Zhao

Figure 1 for TCR: A Transformer Based Deep Network for Predicting Cancer Drugs Response

Figure 2 for TCR: A Transformer Based Deep Network for Predicting Cancer Drugs Response

Figure 3 for TCR: A Transformer Based Deep Network for Predicting Cancer Drugs Response

Figure 4 for TCR: A Transformer Based Deep Network for Predicting Cancer Drugs Response

Abstract:Predicting clinical outcomes to anti-cancer drugs on a personalized basis is challenging in cancer treatment due to the heterogeneity of tumors. Traditional computational efforts have been made to model the effect of drug response on individual samples depicted by their molecular profile, yet overfitting occurs because of the high dimension for omics data, hindering models from clinical application. Recent research shows that deep learning is a promising approach to build drug response models by learning alignment patterns between drugs and samples. However, existing studies employed the simple feature fusion strategy and only considered the drug features as a whole representation while ignoring the substructure information that may play a vital role when aligning drugs and genes. Hereby in this paper, we propose TCR (Transformer based network for Cancer drug Response) to predict anti-cancer drug response. By utilizing an attention mechanism, TCR is able to learn the interactions between drug atom/sub-structure and molecular signatures efficiently in our study. Furthermore, a dual loss function and cross sampling strategy were designed to improve the prediction power of TCR. We show that TCR outperformed all other methods under various data splitting strategies on all evaluation matrices (some with significant improvement). Extensive experiments demonstrate that TCR shows significantly improved generalization ability on independent in-vitro experiments and in-vivo real patient data. Our study highlights the prediction power of TCR and its potential value for cancer drug repurpose and precision oncology treatment.

* 11 pages,7 figures

Via

Access Paper or Ask Questions

HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer

May 17, 2022

Shanzhuo Zhang, Zhiyuan Yan, Yueyang Huang, Lihang Liu, Donglong He, Wei Wang, Xiaomin Fang, Xiaonan Zhang, Fan Wang, Hua Wu(+1 more)

Figure 1 for HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer

Figure 2 for HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer

Figure 3 for HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer

Figure 4 for HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer

Abstract:Accurate ADMET (an abbreviation for "absorption, distribution, metabolism, excretion, and toxicity") predictions can efficiently screen out undesirable drug candidates in the early stage of drug discovery. In recent years, multiple comprehensive ADMET systems that adopt advanced machine learning models have been developed, providing services to estimate multiple endpoints. However, those ADMET systems usually suffer from weak extrapolation ability. First, due to the lack of labelled data for each endpoint, typical machine learning models perform frail for the molecules with unobserved scaffolds. Second, most systems only provide fixed built-in endpoints and cannot be customised to satisfy various research requirements. To this end, we develop a robust and endpoint extensible ADMET system, HelixADMET (H-ADMET). H-ADMET incorporates the concept of self-supervised learning to produce a robust pre-trained model. The model is then fine-tuned with a multi-task and multi-stage framework to transfer knowledge between ADMET endpoints, auxiliary tasks, and self-supervised tasks. Our results demonstrate that H-ADMET achieves an overall improvement of 4%, compared with existing ADMET systems on comparable endpoints. Additionally, the pre-trained model provided by H-ADMET can be fine-tuned to generate new and customised ADMET endpoints, meeting various demands of drug research and development requirements.

Via

Access Paper or Ask Questions

Study on MCS Selection and Spectrum Allocation for URLLC Traffic under Delay and Reliability Constraint in 5G Network

Jan 06, 2021

Yuehong Gao, Changhao Sun, Xiaonan Zhang, Xiao Hong

Figure 1 for Study on MCS Selection and Spectrum Allocation for URLLC Traffic under Delay and Reliability Constraint in 5G Network

Figure 2 for Study on MCS Selection and Spectrum Allocation for URLLC Traffic under Delay and Reliability Constraint in 5G Network

Figure 3 for Study on MCS Selection and Spectrum Allocation for URLLC Traffic under Delay and Reliability Constraint in 5G Network

Figure 4 for Study on MCS Selection and Spectrum Allocation for URLLC Traffic under Delay and Reliability Constraint in 5G Network

Abstract:To support Ultra-Reliable and Low Latency Communications (URLLC) is an essential character of the 5th Generation (5G) communication system. Unlike the other two use cases defined in 5G, e.g. enhanced Mobile Broadband (eMBB) and massive Machine Type Communications (mMTC), URLLC traffic has strict delay and reliability requirement. In this paper, an analysis model for URLLC traffic is proposed from the generation of a URLLC traffic until its transmission over a wireless channel, where channel quality, coding scheme with finite coding length, modulation scheme and allocated spectrum resource are taken into consideration. Then, network calculus analysis is applied to derive the delay guarantee for periodical URLLC traffic. Based on the delay analysis, the admission region is found under certain delay and reliability requirement, which gives a lower bound on required spectrum resource. Theoretical results in the scenario of a 5G New Radio system are presented, where the SNR thresholds for adaptive modulation and coding scheme selection, transmission rate and delay, as well as admission region under different configurations are discussed. In addition, simulation results are obtained and compared with theoretical results, which validates that the admission region derived in this work provides a lower spectrum allocation bound.

Via

Access Paper or Ask Questions