Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yiyu Liu

TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval

Feb 28, 2025

Chien-Yu Lin, Keisuke Kamahori, Yiyu Liu, Xiaoxiang Shi, Madhav Kashyap, Yile Gu, Rulin Shao, Zihao Ye, Kan Zhu, Stephanie Wang(+4 more)

Abstract:Retrieval-augmented generation (RAG) extends large language models (LLMs) with external data sources to enhance factual correctness and domain coverage. Modern RAG pipelines rely on large datastores, leading to system challenges in latency-sensitive deployments, especially when limited GPU memory is available. To address these challenges, we propose TeleRAG, an efficient inference system that reduces RAG latency with minimal GPU memory requirements. The core innovation of TeleRAG is lookahead retrieval, a prefetching mechanism that anticipates required data and transfers it from CPU to GPU in parallel with LLM generation. By leveraging the modularity of RAG pipelines, the inverted file index (IVF) search algorithm and similarities between queries, TeleRAG optimally overlaps data movement and computation. Experimental results show that TeleRAG reduces end-to-end RAG inference latency by up to 1.72x on average compared to state-of-the-art systems, enabling faster, more memory-efficient deployments of advanced RAG applications.

Via

Access Paper or Ask Questions

Concept-Aware Denoising Graph Neural Network for Micro-Video Recommendation

Sep 28, 2021

Yiyu Liu, Qian Liu, Yu Tian, Changping Wang, Yanan Niu, Yang Song, Chenliang Li

Figure 1 for Concept-Aware Denoising Graph Neural Network for Micro-Video Recommendation

Figure 2 for Concept-Aware Denoising Graph Neural Network for Micro-Video Recommendation

Figure 3 for Concept-Aware Denoising Graph Neural Network for Micro-Video Recommendation

Figure 4 for Concept-Aware Denoising Graph Neural Network for Micro-Video Recommendation

Abstract:Recently, micro-video sharing platforms such as Kuaishou and Tiktok have become a major source of information for people's lives. Thanks to the large traffic volume, short video lifespan and streaming fashion of these services, it has become more and more pressing to improve the existing recommender systems to accommodate these challenges in a cost-effective way. In this paper, we propose a novel concept-aware denoising graph neural network (named CONDE) for micro-video recommendation. CONDE consists of a three-phase graph convolution process to derive user and micro-video representations: warm-up propagation, graph denoising and preference refinement. A heterogeneous tripartite graph is constructed by connecting user nodes with video nodes, and video nodes with associated concept nodes, extracted from captions and comments of the videos. To address the noisy information in the graph, we introduce a user-oriented graph denoising phase to extract a subgraph which can better reflect the user's preference. Despite the main focus of micro-video recommendation in this paper, we also show that our method can be generalized to other types of tasks. Therefore, we also conduct empirical studies on a well-known public E-commerce dataset. The experimental results suggest that the proposed CONDE achieves significantly better recommendation performance than the existing state-of-the-art solutions.

* 9 pages

Via

Access Paper or Ask Questions

MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

May 23, 2020

Canwen Xu, Jiaxin Pei, Hongtao Wu, Yiyu Liu, Chenliang Li

Figure 1 for MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

Figure 2 for MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

Figure 3 for MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

Figure 4 for MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

Abstract:Recently, large-scale datasets have vastly facilitated the development in nearly all domains of Natural Language Processing. However, there is currently no cross-task dataset in NLP, which hinders the development of multi-task learning. We propose MATINF, the first jointly labeled large-scale dataset for classification, question answering and summarization. MATINF contains 1.07 million question-answer pairs with human-labeled categories and user-generated question descriptions. Based on such rich information, MATINF is applicable for three major NLP tasks, including classification, question answering, and summarization. We benchmark existing methods and a novel multi-task baseline over MATINF to inspire further research. Our comprehensive comparison and experiments over MATINF and other datasets demonstrate the merits held by MATINF.

* Accepted as a long paper at ACL 2020

Via

Access Paper or Ask Questions