Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiamin Cao

TrainMover: Efficient ML Training Live Migration with No Memory Overhead

Dec 17, 2024

ChonLam Lao, Minlan Yu, Aditya Akella, Jiamin Cao, Yu Guan, Pengcheng Zhang, Zhilong Zheng, Yichi Xu, Ennan Zhai, Dennis Cai(+1 more)

Figure 1 for TrainMover: Efficient ML Training Live Migration with No Memory Overhead

Figure 2 for TrainMover: Efficient ML Training Live Migration with No Memory Overhead

Figure 3 for TrainMover: Efficient ML Training Live Migration with No Memory Overhead

Figure 4 for TrainMover: Efficient ML Training Live Migration with No Memory Overhead

Abstract:Machine learning training has emerged as one of the most prominent workloads in modern data centers. These training jobs are large-scale, long-lasting, and tightly coupled, and are often disrupted by various events in the cluster such as failures, maintenance, and job scheduling. To handle these events, we rely on cold migration, where we first checkpoint the entire cluster, replace the related machines, and then restart the training. This approach leads to disruptions to the training jobs, resulting in significant downtime. In this paper, we present TrainMover, a live migration system that enables machine replacement during machine learning training. TrainMover minimizes downtime by leveraging member replacement of collective communication groups and sandbox lazy initialization. Our evaluation demonstrates that TrainMover achieves 16x less downtime compared to all baselines, effectively handling data center events like straggler rebalancing, maintenance, and unexpected failures.

* 13 pages body, 19 pages total

Via

Access Paper or Ask Questions

Cora: Accelerating Stateful Network Applications with SmartNICs

Oct 29, 2024

Shaoke Xi, Jiaqi Gao, Mengqi Liu, Jiamin Cao, Fuliang Li, Kai Bu, Kui Ren, Minlan Yu, Dennis Cai, Ennan Zhai

Figure 1 for Cora: Accelerating Stateful Network Applications with SmartNICs

Figure 2 for Cora: Accelerating Stateful Network Applications with SmartNICs

Figure 3 for Cora: Accelerating Stateful Network Applications with SmartNICs

Figure 4 for Cora: Accelerating Stateful Network Applications with SmartNICs

Abstract:With the growing performance requirements on networked applications, there is a new trend of offloading stateful network applications to SmartNICs to improve performance and reduce the total cost of ownership. However, offloading stateful network applications is non-trivial due to state operation complexity, state resource consumption, and the complicated relationship between traffic and state. Naively partitioning the program by state or traffic can result in a suboptimal partition plan with higher CPU usage or even packet drops. In this paper, we propose Cora, a compiler and runtime that offloads stateful network applications to SmartNIC-accelerated hosts. Cora compiler introduces an accurate performance model for each SmartNIC and employs an efficient compiling algorithm to search the offloading plan. Cora runtime can monitor traffic dynamics and adapt to minimize CPU usage. Cora is built atop Netronome Agilio and BlueField 2 SmartNICs. Our evaluation shows that for the same throughput target, Cora can propose partition plans saving up to 94.0% CPU cores, 1.9 times more than baseline solutions. Under the same resource constraint, Cora can accelerate network functions by 44.9%-82.3%. Cora runtime can adapt to traffic changes and keep CPU usage low.

Via

Access Paper or Ask Questions

Improving the Multi-label Atomic Activity Recognition by Robust Visual Feature and Advanced Attention @ ROAD++ Atomic Activity Recognition 2024

Oct 21, 2024

Jiamin Cao, Lingqi Wang, Kexin Zhang, Yuting Yang, Licheng Jiao, Yuwei Guo

Figure 1 for Improving the Multi-label Atomic Activity Recognition by Robust Visual Feature and Advanced Attention @ ROAD++ Atomic Activity Recognition 2024

Abstract:Road++ Track3 proposes a multi-label atomic activity recognition task in traffic scenarios, which can be standardized as a 64-class multi-label video action recognition task. In the multi-label atomic activity recognition task, the robustness of visual feature extraction remains a key challenge, which directly affects the model performance and generalization ability. To cope with these issues, our team optimized three aspects: data processing, model and post-processing. Firstly, the appropriate resolution and video sampling strategy are selected, and a fixed sampling strategy is set on the validation and test sets. Secondly, in terms of model training, the team selects a variety of visual backbone networks for feature extraction, and then introduces the action-slot model, which is trained on the training and validation sets, and reasoned on the test set. Finally, for post-processing, the team combined the strengths and weaknesses of different models for weighted fusion, and the final mAP on the test set was 58%, which is 4% higher than the challenge baseline.

Via

Access Paper or Ask Questions