Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kai Zhong

RankFlow: A Multi-Role Collaborative Reranking Workflow Utilizing Large Language Models

Feb 04, 2025

Can Jin, Hongwu Peng, Anxiang Zhang, Nuo Chen, Jiahui Zhao, Xi Xie, Kuangzheng Li, Shuya Feng, Kai Zhong, Caiwen Ding(+1 more)

Abstract:In an Information Retrieval (IR) system, reranking plays a critical role by sorting candidate passages according to their relevance to a specific query. This process demands a nuanced understanding of the variations among passages linked to the query. In this work, we introduce RankFlow, a multi-role reranking workflow that leverages the capabilities of Large Language Models (LLMs) and role specializations to improve reranking performance. RankFlow enlists LLMs to fulfill four distinct roles: the query Rewriter, the pseudo Answerer, the passage Summarizer, and the Reranker. This orchestrated approach enables RankFlow to: (1) accurately interpret queries, (2) draw upon LLMs' extensive pre-existing knowledge, (3) distill passages into concise versions, and (4) assess passages in a comprehensive manner, resulting in notably better reranking results. Our experimental results reveal that RankFlow outperforms existing leading approaches on widely recognized IR benchmarks, such as TREC-DL, BEIR, and NovelEval. Additionally, we investigate the individual contributions of each role in RankFlow. Code is available at https://github.com/jincan333/RankFlow.

Via

Access Paper or Ask Questions

Joint Beamforming and Antenna Position Design for IRS-Aided Multi-User Movable Antenna Systems

Oct 01, 2024

Yue Geng, Tee Hiang Cheng, Kai Zhong, Kah Chan Teh, Qingqing Wu

Figure 1 for Joint Beamforming and Antenna Position Design for IRS-Aided Multi-User Movable Antenna Systems

Figure 2 for Joint Beamforming and Antenna Position Design for IRS-Aided Multi-User Movable Antenna Systems

Figure 3 for Joint Beamforming and Antenna Position Design for IRS-Aided Multi-User Movable Antenna Systems

Figure 4 for Joint Beamforming and Antenna Position Design for IRS-Aided Multi-User Movable Antenna Systems

Abstract:Intelligent reflecting surface (IRS) and movable antenna (MA) technologies have been proposed to enhance wireless communications by creating favorable channel conditions. This paper investigates the joint beamforming and antenna position design for an MA-enabled IRS (MA-IRS)-aided multi-user multiple-input single-output (MU-MISO) communication system, where the MA-IRS is deployed to aid the communication between the MA-enabled base station (BS) and user equipment (UE). In contrast to conventional fixed position antenna (FPA)-enabled IRS (FPA-IRS), the MA-IRS enhances the wireless channel by controlling the positions of the reflecting elements. To verify the system's effectiveness and optimize its performance, we formulate a sum-rate maximization problem with a minimum rate threshold constraint for the MU-MISO communication. To tackle the non-convex problem, a product Riemannian manifold optimization (PRMO) method is proposed for the joint design of the beamforming and MA positions. Specifically, a product Riemannian manifold space (PRMS) is constructed and the corresponding Riemannian gradient is derived for updating the variables, and the Riemannian exact penalty (REP) method and a Riemannian Broyden-Fletcher-Goldfarb-Shanno (RBFGS) algorithm is derived to obtain a feasible solution over the PRMS. Simulation results demonstrate that compared with the conventional FPA-IRS-aided MU-MISO communication, the reflecting elements of the MA-IRS can move to the positions with higher channel gain, thus enhancing the system performance. Furthermore, it is shown that integrating MA with IRS leads to higher performance gains compared to integrating MA with BS.

* 13 pages, 11 figures

Via

Access Paper or Ask Questions

APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking

Jun 20, 2024

Can Jin, Hongwu Peng, Shiyu Zhao, Zhenting Wang, Wujiang Xu, Ligong Han, Jiahui Zhao, Kai Zhong, Sanguthevar Rajasekaran, Dimitris N. Metaxas

Abstract:Large Language Models (LLMs) have significantly enhanced Information Retrieval (IR) across various modules, such as reranking. Despite impressive performance, current zero-shot relevance ranking with LLMs heavily relies on human prompt engineering. Existing automatic prompt engineering algorithms primarily focus on language modeling and classification tasks, leaving the domain of IR, particularly reranking, underexplored. Directly applying current prompt engineering algorithms to relevance ranking is challenging due to the integration of query and long passage pairs in the input, where the ranking complexity surpasses classification tasks. To reduce human effort and unlock the potential of prompt optimization in reranking, we introduce a novel automatic prompt engineering algorithm named APEER. APEER iteratively generates refined prompts through feedback and preference optimization. Extensive experiments with four LLMs and ten datasets demonstrate the substantial performance improvement of APEER over existing state-of-the-art (SoTA) manual prompts. Furthermore, we find that the prompts generated by APEER exhibit better transferability across diverse tasks and LLMs. Code is available at https://github.com/jincan333/APEER.

Via

Access Paper or Ask Questions

FOSS: A Self-Learned Doctor for Query Optimizer

Dec 11, 2023

Kai Zhong, Luming Sun, Tao Ji, Cuiping Li, Hong Chen

Abstract:Various works have utilized deep reinforcement learning (DRL) to address the query optimization problem in database system. They either learn to construct plans from scratch in a bottom-up manner or guide the plan generation behavior of traditional optimizer using hints. While these methods have achieved some success, they face challenges in either low training efficiency or limited plan search space. To address these challenges, we introduce FOSS, a novel DRL-based framework for query optimization. FOSS initiates optimization from the original plan generated by a traditional optimizer and incrementally refines suboptimal nodes of the plan through a sequence of actions. Additionally, we devise an asymmetric advantage model to evaluate the advantage between two plans. We integrate it with a traditional optimizer to form a simulated environment. Leveraging this simulated environment, FOSS can bootstrap itself to rapidly generate a large amount of high-quality simulated experiences. FOSS then learns and improves its optimization capability from these simulated experiences. We evaluate the performance of FOSS on Join Order Benchmark, TPC-DS, and Stack Overflow. The experimental results demonstrate that FOSS outperforms the state-of-the-art methods in terms of latency performance and optimization time. Compared to PostgreSQL, FOSS achieves savings ranging from 15% to 83% in total latency across different benchmarks.

Via

Access Paper or Ask Questions

Doppler velocity-based algorithm for Clustering and Velocity Estimation of moving objects

Dec 24, 2021

Mian Guo, Kai Zhong, Xiaozhi Wang

Figure 1 for Doppler velocity-based algorithm for Clustering and Velocity Estimation of moving objects

Figure 2 for Doppler velocity-based algorithm for Clustering and Velocity Estimation of moving objects

Figure 3 for Doppler velocity-based algorithm for Clustering and Velocity Estimation of moving objects

Figure 4 for Doppler velocity-based algorithm for Clustering and Velocity Estimation of moving objects

Abstract:We propose a Doppler velocity-based cluster and velocity estimation algorithm based on the characteristics of FMCW LiDAR which achieves highly accurate, single-scan, and real-time motion state detection and velocity estimation. We prove the continuity of the Doppler velocity on the same object. Based on this principle, we achieve the distinction between moving objects and stationary background via region growing clustering algorithm. The obtained stationary background will be used to estimate the velocity of the FMCW LiDAR by the least-squares method. Then we estimate the velocity of the moving objects using the estimated LiDAR velocity and the Doppler velocity of moving objects obtained by clustering. To ensure real-time processing, we set the appropriate least-squares parameters. Meanwhile, to verify the effectiveness of the algorithm, we create the FMCW LiDAR model on the autonomous driving simulation platform CARLA for spawning data. The results show that our algorithm can process at least a 4.5million points and estimate the velocity of 150 moving objects per second under the arithmetic power of the Ryzen 3600x CPU, with a motion state detection accuracy of over 99% and estimated velocity accuracy of 0.1 m/s.

* 7 pages, 9 figures, 2 tables, 2 algorithms, CACRE2022

Via

Access Paper or Ask Questions

Extreme Multi-label Learning for Semantic Matching in Product Search

Jun 23, 2021

Wei-Cheng Chang, Daniel Jiang, Hsiang-Fu Yu, Choon-Hui Teo, Jiong Zhang, Kai Zhong, Kedarnath Kolluri, Qie Hu, Nikhil Shandilya, Vyacheslav Ievgrafov(+2 more)

Figure 1 for Extreme Multi-label Learning for Semantic Matching in Product Search

Figure 2 for Extreme Multi-label Learning for Semantic Matching in Product Search

Figure 3 for Extreme Multi-label Learning for Semantic Matching in Product Search

Figure 4 for Extreme Multi-label Learning for Semantic Matching in Product Search

Abstract:We consider the problem of semantic matching in product search: given a customer query, retrieve all semantically related products from a huge catalog of size 100 million, or more. Because of large catalog spaces and real-time latency constraints, semantic matching algorithms not only desire high recall but also need to have low latency. Conventional lexical matching approaches (e.g., Okapi-BM25) exploit inverted indices to achieve fast inference time, but fail to capture behavioral signals between queries and products. In contrast, embedding-based models learn semantic representations from customer behavior data, but the performance is often limited by shallow neural encoders due to latency constraints. Semantic product search can be viewed as an eXtreme Multi-label Classification (XMC) problem, where customer queries are input instances and products are output labels. In this paper, we aim to improve semantic product search by using tree-based XMC models where inference time complexity is logarithmic in the number of products. We consider hierarchical linear models with n-gram features for fast real-time inference. Quantitatively, our method maintains a low latency of 1.25 milliseconds per query and achieves a 65% improvement of Recall@100 (60.9% v.s. 36.8%) over a competing embedding-based DSSM model. Our model is robust to weight pruning with varying thresholds, which can flexibly meet different system requirements for online deployments. Qualitatively, our method can retrieve products that are complementary to existing product search system and add diversity to the match set.

* Accepted in KDD 2021 Applied Data Science Track

Via

Access Paper or Ask Questions

BoolNet: Minimizing The Energy Consumption of Binary Neural Networks

Jun 13, 2021

Nianhui Guo, Joseph Bethge, Haojin Yang, Kai Zhong, Xuefei Ning, Christoph Meinel, Yu Wang

Figure 1 for BoolNet: Minimizing The Energy Consumption of Binary Neural Networks

Figure 2 for BoolNet: Minimizing The Energy Consumption of Binary Neural Networks

Figure 3 for BoolNet: Minimizing The Energy Consumption of Binary Neural Networks

Figure 4 for BoolNet: Minimizing The Energy Consumption of Binary Neural Networks

Abstract:Recent works on Binary Neural Networks (BNNs) have made promising progress in narrowing the accuracy gap of BNNs to their 32-bit counterparts. However, the accuracy gains are often based on specialized model designs using additional 32-bit components. Furthermore, almost all previous BNNs use 32-bit for feature maps and the shortcuts enclosing the corresponding binary convolution blocks, which helps to effectively maintain the accuracy, but is not friendly to hardware accelerators with limited memory, energy, and computing resources. Thus, we raise the following question: How can accuracy and energy consumption be balanced in a BNN network design? We extensively study this fundamental problem in this work and propose a novel BNN architecture without most commonly used 32-bit components: \textit{BoolNet}. Experimental results on ImageNet demonstrate that BoolNet can achieve 4.6x energy reduction coupled with 1.2\% higher accuracy than the commonly used BNN architecture Bi-RealNet. Code and trained models are available at: https://github.com/hpi-xnor/BoolNet.

Via

Access Paper or Ask Questions

Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees

Jun 09, 2021

Philip A. Etter, Kai Zhong, Hsiang-Fu Yu, Lexing Ying, Inderjit Dhillon

Figure 1 for Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees

Figure 2 for Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees

Figure 3 for Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees

Figure 4 for Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees

Abstract:Tree-based models underpin many modern semantic search engines and recommender systems due to their sub-linear inference times. In industrial applications, these models operate at extreme scales, where every bit of performance is critical. Memory constraints at extreme scales also require that models be sparse, hence tree-based models are often back-ended by sparse matrix algebra routines. However, there are currently no sparse matrix techniques specifically designed for the sparsity structure one encounters in tree-based models for extreme multi-label ranking/classification (XMR/XMC) problems. To address this issue, we present the masked sparse chunk multiplication (MSCM) technique, a sparse matrix technique specifically tailored to XMR trees. MSCM is easy to implement, embarrassingly parallelizable, and offers a significant performance boost to any existing tree inference pipeline at no cost. We perform a comprehensive study of MSCM applied to several different sparse inference schemes and benchmark our methods on a general purpose extreme multi-label ranking framework. We observe that MSCM gives consistently dramatic speedups across both the online and batch inference settings, single- and multi-threaded settings, and on many different tree models and datasets. To demonstrate its utility in industrial applications, we apply MSCM to an enterprise-scale semantic product search problem with 100 million products and achieve sub-millisecond latency of 0.88 ms per query on a single thread -- an 8x reduction in latency over vanilla inference techniques. The MSCM technique requires absolutely no sacrifices to model accuracy as it gives exactly the same results as standard sparse matrix techniques. Therefore, we believe that MSCM will enable users of XMR trees to save a substantial amount of compute resources in their inference pipelines at very little cost.

Via

Access Paper or Ask Questions

Machine Learning for Electronic Design Automation: A Survey

Jan 10, 2021

Guyue Huang, Jingbo Hu, Yifan He, Jialong Liu, Mingyuan Ma, Zhaoyang Shen, Juejian Wu, Yuanfan Xu, Hengrui Zhang, Kai Zhong(+6 more)

Figure 1 for Machine Learning for Electronic Design Automation: A Survey

Figure 2 for Machine Learning for Electronic Design Automation: A Survey

Figure 3 for Machine Learning for Electronic Design Automation: A Survey

Figure 4 for Machine Learning for Electronic Design Automation: A Survey

Abstract:With the down-scaling of CMOS technology, the design complexity of very large-scale integrated (VLSI) is increasing. Although the application of machine learning (ML) techniques in electronic design automation (EDA) can trace its history back to the 90s, the recent breakthrough of ML and the increasing complexity of EDA tasks have aroused more interests in incorporating ML to solve EDA tasks. In this paper, we present a comprehensive review of existing ML for EDA studies, organized following the EDA hierarchy.

Via

Access Paper or Ask Questions

PECOS: Prediction for Enormous and Correlated Output Spaces

Oct 12, 2020

Hsiang-Fu Yu, Kai Zhong, Inderjit S. Dhillon

Figure 1 for PECOS: Prediction for Enormous and Correlated Output Spaces

Figure 2 for PECOS: Prediction for Enormous and Correlated Output Spaces

Figure 3 for PECOS: Prediction for Enormous and Correlated Output Spaces

Figure 4 for PECOS: Prediction for Enormous and Correlated Output Spaces

Abstract:Many challenging problems in modern applications amount to finding relevant results from an enormous output space of potential candidates. The size of the output space for these problems can range from millions to billions. Moreover, training data is often limited for many of the so-called ``long-tail'' of items in the output space. Given the inherent paucity of training data for most of the items in the output space, developing machine learned models that perform well for spaces of this size is challenging. Fortunately, items in the output space are often correlated thereby presenting an opportunity to alleviate the data sparsity issue. In this paper, we propose the Prediction for Enormous and Correlated Output Spaces (PECOS) framework, a versatile and modular machine learning framework for solving prediction problems for very large output spaces, and apply it to the eXtreme Multilabel Ranking (XMR) problem: given an input instance, find and rank the most relevant items from an enormous but fixed and finite output space. PECOS is a three-phase framework: (i) in the first phase, PECOS organizes the output space using a semantic indexing scheme, (ii) in the second phase, PECOS uses the indexing to narrow down the output space by orders of magnitude using a machine learned matching scheme, and (iii) in the third phase, PECOS ranks the matched items using a final ranking scheme. The versatility and modularity of PECOS allows for easy plug-and-play of various choices for the indexing, matching, and ranking phases. On a dataset where the output space is of size 2.8 million, PECOS with a neural matcher results in a 10% increase in precision@1 (from 46% to 51.2%) over PECOS with a recursive linear matcher but takes 265x more time to train. We also develop fast real time inference procedures; for example, inference takes less than 10 milliseconds on the data set with 2.8 million labels.

Via

Access Paper or Ask Questions