Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines

May 18, 2024

Chaokun Chang, Eric Lo, Chunxiao Ye

Figure 1 for Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines

Figure 2 for Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines

Figure 3 for Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines

Figure 4 for Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines

Share this with someone who'll enjoy it:

Abstract:Machine learning inference pipelines commonly encountered in data science and industries often require real-time responsiveness due to their user-facing nature. However, meeting this requirement becomes particularly challenging when certain input features require aggregating a large volume of data online. Recent literature on interpretable machine learning reveals that most machine learning models exhibit a notable degree of resilience to variations in input. This suggests that machine learning models can effectively accommodate approximate input features with minimal discernible impact on accuracy. In this paper, we introduce Biathlon, a novel ML serving system that leverages the inherent resilience of models and determines the optimal degree of approximation for each aggregation feature. This approach enables maximum speedup while ensuring a guaranteed bound on accuracy loss. We evaluate Biathlon on real pipelines from both industry applications and data science competitions, demonstrating its ability to meet real-time latency requirements by achieving 5.3x to 16.6x speedup with almost no accuracy loss.

View paper on

Share this with someone who'll enjoy it:

Title:Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines

Paper and Code