Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yunteng Geng

GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

May 25, 2024

Minghao Xu, Yunteng Geng, Yihang Zhang, Ling Yang, Jian Tang, Wentao Zhang

Figure 1 for GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

Figure 2 for GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

Figure 3 for GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

Figure 4 for GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

Abstract:Glycans are basic biomolecules and perform essential functions within living organisms. The rapid increase of functional glycan data provides a good opportunity for machine learning solutions to glycan understanding. However, there still lacks a standard machine learning benchmark for glycan function prediction. In this work, we fill this blank by building a comprehensive benchmark for Glycan Machine Learning (GlycanML). The GlycanML benchmark consists of diverse types of tasks including glycan taxonomy prediction, glycan immunogenicity prediction, glycosylation type prediction, and protein-glycan interaction prediction. Glycans can be represented by both sequences and graphs in GlycanML, which enables us to extensively evaluate sequence-based models and graph neural networks (GNNs) on benchmark tasks. Furthermore, by concurrently performing eight glycan taxonomy prediction tasks, we introduce the GlycanML-MTL testbed for multi-task learning (MTL) algorithms. Experimental results show the superiority of modeling glycans with multi-relational GNNs, and suitable MTL methods can further boost model performance. We provide all datasets and source codes at https://github.com/GlycanML/GlycanML and maintain a leaderboard at https://GlycanML.github.io/project

* Research project paper. All code and data are released

Via

Access Paper or Ask Questions

Retrieval-Augmented Generation for AI-Generated Content: A Survey

Feb 29, 2024

Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Bin Cui

Figure 1 for Retrieval-Augmented Generation for AI-Generated Content: A Survey

Figure 2 for Retrieval-Augmented Generation for AI-Generated Content: A Survey

Figure 3 for Retrieval-Augmented Generation for AI-Generated Content: A Survey

Figure 4 for Retrieval-Augmented Generation for AI-Generated Content: A Survey

Abstract:The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by advancements in model algorithms, scalable foundation model architectures, and the availability of ample high-quality datasets. While AIGC has achieved remarkable performance, it still faces challenges, such as the difficulty of maintaining up-to-date and long-tail knowledge, the risk of data leakage, and the high costs associated with training and inference. Retrieval-Augmented Generation (RAG) has recently emerged as a paradigm to address such challenges. In particular, RAG introduces the information retrieval process, which enhances AIGC results by retrieving relevant objects from available data stores, leading to greater accuracy and robustness. In this paper, we comprehensively review existing efforts that integrate RAG technique into AIGC scenarios. We first classify RAG foundations according to how the retriever augments the generator. We distill the fundamental abstractions of the augmentation methodologies for various retrievers and generators. This unified perspective encompasses all RAG scenarios, illuminating advancements and pivotal technologies that help with potential future progress. We also summarize additional enhancements methods for RAG, facilitating effective engineering and implementation of RAG systems. Then from another view, we survey on practical applications of RAG across different modalities and tasks, offering valuable references for researchers and practitioners. Furthermore, we introduce the benchmarks for RAG, discuss the limitations of current RAG systems, and suggest potential directions for future research. Project: https://github.com/hymie122/RAG-Survey

* Citing 259 papers, 29 pages, 8 figures. Project: https://github.com/hymie122/RAG-Survey

Via

Access Paper or Ask Questions