Abstract:Video analytics systems based on deep learning models are often opaque and brittle and require explanation systems to help users debug. Current model explanation system are very good at giving literal explanations of behavior in terms of pixel contributions but cannot integrate information about the physical or systems processes that might influence a prediction. This paper introduces the idea that a simple form of causal reasoning, called a regression discontinuity design, can be used to associate changes in multiple key performance indicators to physical real world phenomena to give users a more actionable set of video analytics explanations. We overview the system architecture and describe a vision of the impact that such a system might have.
Abstract:Network traffic analysis increasingly uses complex machine learning models as the internet consolidates and traffic gets more encrypted. However, over high-bandwidth networks, flows can easily arrive faster than model inference rates. The temporal nature of network flows limits simple scale-out approaches leveraged in other high-traffic machine learning applications. Accordingly, this paper presents ServeFlow, a solution for machine-learning model serving aimed at network traffic analysis tasks, which carefully selects the number of packets to collect and the models to apply for individual flows to achieve a balance between minimal latency, high service rate, and high accuracy. We identify that on the same task, inference time across models can differ by 2.7x-136.3x, while the median inter-packet waiting time is often 6-8 orders of magnitude higher than the inference time! ServeFlow is able to make inferences on 76.3% flows in under 16ms, which is a speed-up of 40.5x on the median end-to-end serving latency while increasing the service rate and maintaining similar accuracy. Even with thousands of features per flow, it achieves a service rate of over 48.5k new flows per second on a 16-core CPU commodity server, which matches the order of magnitude of flow rates observed on city-level network backbones.
Abstract:The relevant features for a machine learning task may be aggregated from data sources collected on different nodes in a network. This problem, which we call decentralized prediction, creates a number of interesting systems challenges in managing data routing, placing computation, and time-synchronization. This paper presents EdgeServe, a machine learning system that can serve decentralized predictions. EdgeServe relies on a low-latency message broker to route data through a network to nodes that can serve predictions. EdgeServe relies on a series of novel optimizations that can tradeoff computation, communication, and accuracy. We evaluate EdgeServe on three decentralized prediction tasks: (1) multi-camera object tracking, (2) network intrusion detection, and (3) human activity recognition.