Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ran Zhu

SMoFi: Step-wise Momentum Fusion for Split Federated Learning on Heterogeneous Data

Nov 16, 2025

Mingkun Yang, Ran Zhu, Qing Wang, Jie Yang

Figure 1 for SMoFi: Step-wise Momentum Fusion for Split Federated Learning on Heterogeneous Data

Figure 2 for SMoFi: Step-wise Momentum Fusion for Split Federated Learning on Heterogeneous Data

Figure 3 for SMoFi: Step-wise Momentum Fusion for Split Federated Learning on Heterogeneous Data

Figure 4 for SMoFi: Step-wise Momentum Fusion for Split Federated Learning on Heterogeneous Data

Abstract:Split Federated Learning is a system-efficient federated learning paradigm that leverages the rich computing resources at a central server to train model partitions. Data heterogeneity across silos, however, presents a major challenge undermining the convergence speed and accuracy of the global model. This paper introduces Step-wise Momentum Fusion (SMoFi), an effective and lightweight framework that counteracts gradient divergence arising from data heterogeneity by synchronizing the momentum buffers across server-side optimizers. To control gradient divergence over the training process, we design a staleness-aware alignment mechanism that imposes constraints on gradient updates of the server-side submodel at each optimization step. Extensive validations on multiple real-world datasets show that SMoFi consistently improves global model accuracy (up to 7.1%) and convergence speed (up to 10.25$\times$). Furthermore, SMoFi has a greater impact with more clients involved and deeper learning models, making it particularly suitable for model training in resource-constrained contexts.

* Paper accepted by AAAI 2026

Via

Access Paper or Ask Questions

dots.llm1 Technical Report

Jun 06, 2025

Bi Huo, Bin Tu, Cheng Qin, Da Zheng, Debing Zhang, Dongjie Zhang, En Li, Fu Guo, Jian Yao, Jie Lou(+17 more)

Abstract:Mixture of Experts (MoE) models have emerged as a promising paradigm for scaling language models efficiently by activating only a subset of parameters for each input token. In this report, we present dots.llm1, a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models while reducing training and inference costs. Leveraging our meticulously crafted and efficient data processing pipeline, dots.llm1 achieves performance comparable to Qwen2.5-72B after pretraining on 11.2T high-quality tokens and post-training to fully unlock its capabilities. Notably, no synthetic data is used during pretraining. To foster further research, we open-source intermediate training checkpoints at every one trillion tokens, providing valuable insights into the learning dynamics of large language models.

Via

Access Paper or Ask Questions

DeepAVO: Efficient Pose Refining with Feature Distilling for Deep Visual Odometry

May 20, 2021

Ran Zhu, Mingkun Yang, Wang Liu, Rujun Song, Bo Yan, Zhuoling Xiao

Figure 1 for DeepAVO: Efficient Pose Refining with Feature Distilling for Deep Visual Odometry

Figure 2 for DeepAVO: Efficient Pose Refining with Feature Distilling for Deep Visual Odometry

Figure 3 for DeepAVO: Efficient Pose Refining with Feature Distilling for Deep Visual Odometry

Figure 4 for DeepAVO: Efficient Pose Refining with Feature Distilling for Deep Visual Odometry

Abstract:The technology for Visual Odometry (VO) that estimates the position and orientation of the moving object through analyzing the image sequences captured by on-board cameras, has been well investigated with the rising interest in autonomous driving. This paper studies monocular VO from the perspective of Deep Learning (DL). Unlike most current learning-based methods, our approach, called DeepAVO, is established on the intuition that features contribute discriminately to different motion patterns. Specifically, we present a novel four-branch network to learn the rotation and translation by leveraging Convolutional Neural Networks (CNNs) to focus on different quadrants of optical flow input. To enhance the ability of feature selection, we further introduce an effective channel-spatial attention mechanism to force each branch to explicitly distill related information for specific Frame to Frame (F2F) motion estimation. Experiments on various datasets involving outdoor driving and indoor walking scenarios show that the proposed DeepAVO outperforms the state-of-the-art monocular methods by a large margin, demonstrating competitive performance to the stereo VO algorithm and verifying promising potential for generalization.

* 17 pages,14 figures, Neurocomputing Journal

Via

Access Paper or Ask Questions