Abstract:Recent advancements in retrieval-augmented generation (RAG) have demonstrated impressive performance in the question-answering (QA) task. However, most previous works predominantly focus on text-based answers. While some studies address multimodal data, they still fall short in generating comprehensive multimodal answers, particularly for explaining concepts or providing step-by-step tutorials on how to accomplish specific goals. This capability is especially valuable for applications such as enterprise chatbots and settings such as customer service and educational systems, where the answers are sourced from multimodal data. In this paper, we introduce a simple and effective framework named MuRAR (Multimodal Retrieval and Answer Refinement). MuRAR enhances text-based answers by retrieving relevant multimodal data and refining the responses to create coherent multimodal answers. This framework can be easily extended to support multimodal answers in enterprise chatbots with minimal modifications. Human evaluation results indicate that multimodal answers generated by MuRAR are more useful and readable compared to plain text answers.
Abstract:Due to factors such as low population density and expansive geographical distances, network deployment falls behind in rural regions, leading to a broadband divide. Wireless spectrum serves as the blood and flesh of wireless communications. Shared white spaces such as those in the TVWS and CBRS spectrum bands offer opportunities to expand connectivity, innovate, and provide affordable access to high-speed Internet in under-served areas without additional cost to expensive licensed spectrum. However, the current methods to utilize these white spaces are inefficient due to very conservative models and spectrum policies, causing under-utilization of valuable spectrum resources. This hampers the full potential of innovative wireless technologies that could benefit farmers, small Internet Service Providers (ISPs) or Mobile Network Operators (MNOs) operating in rural regions. This study explores the challenges faced by farmers and service providers when using shared spectrum bands to deploy their networks while ensuring maximum system performance and minimizing interference with other users. Additionally, we discuss how spatiotemporal spectrum models, in conjunction with database-driven spectrum-sharing solutions, can enhance the allocation and management of spectrum resources, ultimately improving the efficiency and reliability of wireless networks operating in shared spectrum bands.
Abstract:Federated learning (FL) has received a surge of interest in recent years thanks to its benefits in data privacy protection, efficient communication, and parallel data processing. Also, with appropriate algorithmic designs, one could achieve the desirable linear speedup for convergence effect in FL. However, most existing works on FL are limited to systems with i.i.d. data and centralized parameter servers and results on decentralized FL with heterogeneous datasets remains limited. Moreover, whether or not the linear speedup for convergence is achievable under fully decentralized FL with data heterogeneity remains an open question. In this paper, we address these challenges by proposing a new algorithm, called NET-FLEET, for fully decentralized FL systems with data heterogeneity. The key idea of our algorithm is to enhance the local update scheme in FL (originally intended for communication efficiency) by incorporating a recursive gradient correction technique to handle heterogeneous datasets. We show that, under appropriate parameter settings, the proposed NET-FLEET algorithm achieves a linear speedup for convergence. We further conduct extensive numerical experiments to evaluate the performance of the proposed NET-FLEET algorithm and verify our theoretical findings.
Abstract:Decentralized nonconvex optimization has received increasing attention in recent years in machine learning due to its advantages in system robustness, data privacy, and implementation simplicity. However, three fundamental challenges in designing decentralized optimization algorithms are how to reduce their sample, communication, and memory complexities. In this paper, we propose a \underline{g}radient-\underline{t}racking-based \underline{sto}chastic \underline{r}ecursive \underline{m}omentum (GT-STORM) algorithm for efficiently solving nonconvex optimization problems. We show that to reach an $\epsilon^2$-stationary solution, the total number of sample evaluations of our algorithm is $\tilde{O}(m^{1/2}\epsilon^{-3})$ and the number of communication rounds is $\tilde{O}(m^{-1/2}\epsilon^{-3})$, which improve the $O(\epsilon^{-4})$ costs of sample evaluations and communications for the existing decentralized stochastic gradient algorithms. We conduct extensive experiments with a variety of learning models, including non-convex logistical regression and convolutional neural networks, to verify our theoretical findings. Collectively, our results contribute to the state of the art of theories and algorithms for decentralized network optimization.
Abstract:With rise of machine learning (ML) and the proliferation of smart mobile devices, recent years have witnessed a surge of interest in performing ML in wireless edge networks. In this paper, we consider the problem of jointly improving data privacy and communication efficiency of distributed edge learning, both of which are critical performance metrics in wireless edge network computing. Toward this end, we propose a new decentralized stochastic gradient method with sparse differential Gaussian-masked stochastic gradients (SDM-DSGD) for non-convex distributed edge learning. Our main contributions are three-fold: i) We theoretically establish the privacy and communication efficiency performance guarantee of our SDM-DSGD method, which outperforms all existing works; ii) We show that SDM-DSGD improves the fundamental training-privacy trade-off by {\em two orders of magnitude} compared with the state-of-the-art. iii) We reveal theoretical insights and offer practical design guidelines for the interactions between privacy preservation and communication efficiency, two conflicting performance goals. We conduct extensive experiments with a variety of learning models on MNIST and CIFAR-10 datasets to verify our theoretical findings. Collectively, our results contribute to the theory and algorithm design for distributed edge learning.
Abstract:In this work, we consider to improve the model estimation efficiency by aggregating the neighbors' information as well as identify the subgroup membership for each node in the network. A tree-based $l_1$ penalty is proposed to save the computation and communication cost. We design a decentralized generalized alternating direction method of multiplier algorithm for solving the objective function in parallel. The theoretical properties are derived to guarantee both the model consistency and the algorithm convergence. Thorough numerical experiments are also conducted to back up our theory, which also show that our approach outperforms in the aspects of the estimation accuracy, computation speed and communication cost.
Abstract:Random forest (RF) methodology is one of the most popular machine learning techniques for prediction problems. In this article, we discuss some cases where random forests may suffer and propose a novel generalized RF method, namely regression-enhanced random forests (RERFs), that can improve on RFs by borrowing the strength of penalized parametric regression. The algorithm for constructing RERFs and selecting its tuning parameters is described. Both simulation study and real data examples show that RERFs have better predictive performance than RFs in important situations often encountered in practice. Moreover, RERFs may incorporate known relationships between the response and the predictors, and may give reliable predictions in extrapolation problems where predictions are required at points out of the domain of the training dataset. Strategies analogous to those described here can be used to improve other machine learning methods via combination with penalized parametric regression techniques.
Abstract:Detecting weak clustered signal in spatial data is important but challenging in applications such as medical image and epidemiology. A more efficient detection algorithm can provide more precise early warning, and effectively reduce the decision risk and cost. To date, many methods have been developed to detect signals with spatial structures. However, most of the existing methods are either too conservative for weak signals or computationally too intensive. In this paper, we consider a novel method named Spatial CUSUM (SCUSUM), which employs the idea of the CUSUM procedure and false discovery rate controlling. We develop theoretical properties of the method which indicates that asymptotically SCUSUM can reach high classification accuracy. In the simulation study, we demonstrate that SCUSUM is sensitive to weak spatial signals. This new method is applied to a real fMRI dataset as illustration, and more irregular weak spatial signals are detected in the images compared to some existing methods, including the conventional FDR, FDR$_L$ and scan statistics.
Abstract:Understanding the convergence performance of asynchronous stochastic gradient descent method (Async-SGD) has received increasing attention in recent years due to their foundational role in machine learning. To date, however, most of the existing works are restricted to either bounded gradient delays or convex settings. In this paper, we focus on Async-SGD and its variant Async-SGDI (which uses increasing batch size) for non-convex optimization problems with unbounded gradient delays. We prove $o(1/\sqrt{k})$ convergence rate for Async-SGD and $o(1/k)$ for Async-SGDI. Also, a unifying sufficient condition for Async-SGD's convergence is established, which includes two major gradient delay models in the literature as special cases and yields a new delay model not considered thus far.
Abstract:A novel multi-resolution cluster detection (MCD) method is proposed to identify irregularly shaped clusters in space. Multi-scale test statistic on a single cell is derived based on likelihood ratio statistic for Bernoulli sequence, Poisson sequence and Normal sequence. A neighborhood variability measure is defined to select the optimal test threshold. The MCD method is compared with single scale testing methods controlling for false discovery rate and the spatial scan statistics using simulation and f-MRI data. The MCD method is shown to be more effective for discovering irregularly shaped clusters, and the implementation of this method does not require heavy computation, making it suitable for cluster detection for large spatial data.