Abstract:This paper explores transfer learning in heterogeneous multi-source environments with distributional divergence between target and auxiliary domains. To address challenges in statistical bias and computational efficiency, we propose a Sparse Optimization for Transfer Learning (SOTL) framework based on L0-regularization. The method extends the Joint Estimation Transferred from Strata (JETS) paradigm with two key innovations: (1) L0-constrained exact sparsity for parameter space compression and complexity reduction, and (2) refining optimization focus to emphasize target parameters over redundant ones. Simulations show that SOTL significantly improves both estimation accuracy and computational speed, especially under adversarial auxiliary domain conditions. Empirical validation on the Community and Crime benchmarks demonstrates the statistical robustness of the SOTL method in cross-domain transfer.
Abstract:In this paper, we propose a communication-efficient penalized regression algorithm for high-dimensional sparse linear regression models with massive data. This approach incorporates an optimized distributed system communication algorithm, named CESDAR algorithm, based on the Enhanced Support Detection and Root finding algorithm. The CESDAR algorithm leverages data distributed across multiple machines to compute and update the active set and introduces the communication-efficient surrogate likelihood framework to approximate the optimal solution for the full sample on the active set, resulting in the avoidance of raw data transmission, which enhances privacy and data security, while significantly improving algorithm execution speed and substantially reducing communication costs. Notably, this approach achieves the same statistical accuracy as the global estimator. Furthermore, this paper explores an extended version of CESDAR and an adaptive version of CESDAR to enhance algorithmic speed and optimize parameter selection, respectively. Simulations and real data benchmarks experiments demonstrate the efficiency and accuracy of the CESDAR algorithm.