Abstract:Retrieval-Augmented Generation (RAG) is widely adopted for its effectiveness and cost-efficiency in mitigating hallucinations and enhancing the domain-specific generation capabilities of large language models (LLMs). However, is this effectiveness and cost-efficiency truly a free lunch? In this study, we comprehensively investigate the fairness costs associated with RAG by proposing a practical three-level threat model from the perspective of user awareness of fairness. Specifically, varying levels of user fairness awareness result in different degrees of fairness censorship on the external dataset. We examine the fairness implications of RAG using uncensored, partially censored, and fully censored datasets. Our experiments demonstrate that fairness alignment can be easily undermined through RAG without the need for fine-tuning or retraining. Even with fully censored and supposedly unbiased external datasets, RAG can lead to biased outputs. Our findings underscore the limitations of current alignment methods in the context of RAG-based LLMs and highlight the urgent need for new strategies to ensure fairness. We propose potential mitigations and call for further research to develop robust fairness safeguards in RAG-based LLMs.
Abstract:Existing rhetorical understanding and generation datasets or corpora primarily focus on single coarse-grained categories or fine-grained categories, neglecting the common interrelations between different rhetorical devices by treating them as independent sub-tasks. In this paper, we propose the Chinese Essay Rhetoric Dataset (CERD), consisting of 4 commonly used coarse-grained categories including metaphor, personification, hyperbole and parallelism and 23 fine-grained categories across both form and content levels. CERD is a manually annotated and comprehensive Chinese rhetoric dataset with five interrelated sub-tasks. Unlike previous work, our dataset aids in understanding various rhetorical devices, recognizing corresponding rhetorical components, and generating rhetorical sentences under given conditions, thereby improving the author's writing proficiency and language usage skills. Extensive experiments are conducted to demonstrate the interrelations between multiple tasks in CERD, as well as to establish a benchmark for future research on rhetoric. The experimental results indicate that Large Language Models achieve the best performance across most tasks, and jointly fine-tuning with multiple tasks further enhances performance.
Abstract:This paper presents an experimental study on mmWave beam profiling on a mmWave testbed, and develops a machine learning model for beamforming based on the experiment data. The datasets we have obtained from the beam profiling and the machine learning model for beamforming are valuable for a broad set of network design problems, such as network topology optimization, user equipment association, power allocation, and beam scheduling, in complex and dynamic mmWave networks. We have used two commercial-grade mmWave testbeds with operational frequencies on the 27 Ghz and 71 GHz, respectively, for beam profiling. The obtained datasets were used to train the machine learning model to estimate the received downlink signal power, and data rate at the receivers (user equipment with different geographical locations in the range of a transmitter (base station). The results have shown high prediction accuracy with low mean square error (loss), indicating the model's ability to estimate the received signal power or data rate at each individual receiver covered by a beam. The dataset and the machine learning-based beamforming model can assist researchers in optimizing various network design problems for mmWave networks.
Abstract:The prevalent use of Transformer-like models, exemplified by ChatGPT in modern language processing applications, underscores the critical need for enabling private inference essential for many cloud-based services reliant on such models. However, current privacy-preserving frameworks impose significant communication burden, especially for non-linear computation in Transformer model. In this paper, we introduce a novel plug-in method Comet to effectively reduce the communication cost without compromising the inference performance. We second introduce an efficient approximation method to eliminate the heavy communication in finding good initial approximation. We evaluate our Comet on Bert and RoBERTa models with GLUE benchmark datasets, showing up to 3.9$\times$ less communication and 3.5$\times$ speedups while keep competitive model performance compared to the prior art.
Abstract:Energy is an essential, but often forgotten aspect in large-scale federated systems. As most of the research focuses on tackling computational and statistical heterogeneity from the machine learning algorithms, the impact on the mobile system still remains unclear. In this paper, we design and implement an online optimization framework by connecting asynchronous execution of federated training with application co-running to minimize energy consumption on battery-powered mobile devices. From a series of experiments, we find that co-running the training process in the background with foreground applications gives the system a deep energy discount with negligible performance slowdown. Based on these results, we first study an offline problem assuming all the future occurrences of applications are available, and propose a dynamic programming-based algorithm. Then we propose an online algorithm using the Lyapunov framework to explore the solution space via the energy-staleness trade-off. The extensive experiments demonstrate that the online optimization framework can save over 60% energy with 3 times faster convergence speed compared to the previous schemes.
Abstract:Machine Learning as a Service (MLaaS) is enabling a wide range of smart applications on end devices. However, such convenience comes with a cost of privacy because users have to upload their private data to the cloud. This research aims to provide effective and efficient MLaaS such that the cloud server learns nothing about user data and the users cannot infer the proprietary model parameters owned by the server. This work makes the following contributions. First, it unveils the fundamental performance bottleneck of existing schemes due to the heavy permutations in computing linear transformation and the use of communication intensive Garbled Circuits for nonlinear transformation. Second, it introduces an ultra-fast secure MLaaS framework, CHEETAH, which features a carefully crafted secret sharing scheme that runs significantly faster than existing schemes without accuracy loss. Third, CHEETAH is evaluated on the benchmark of well-known, practical deep networks such as AlexNet and VGG-16 on the MNIST and ImageNet datasets. The results demonstrate more than 100x speedup over the fastest GAZELLE (Usenix Security'18), 2000x speedup over MiniONN (ACM CCS'17) and five orders of magnitude speedup over CryptoNets (ICML'16). This significant speedup enables a wide range of practical applications based on privacy-preserved deep neural networks.