Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zheyu Fu

Efficiently Serving LLM Reasoning Programs with Certaindex

Dec 30, 2024

Yichao Fu, Junda Chen, Siqi Zhu, Zheyu Fu, Zhongdongming Dai, Aurick Qiao, Hao Zhang

Figure 1 for Efficiently Serving LLM Reasoning Programs with Certaindex

Figure 2 for Efficiently Serving LLM Reasoning Programs with Certaindex

Figure 3 for Efficiently Serving LLM Reasoning Programs with Certaindex

Figure 4 for Efficiently Serving LLM Reasoning Programs with Certaindex

Abstract:The rapid evolution of large language models (LLMs) has unlocked their capabilities in advanced reasoning tasks like mathematical problem-solving, code generation, and legal analysis. Central to this progress are inference-time reasoning algorithms, which refine outputs by exploring multiple solution paths, at the cost of increasing compute demands and response latencies. Existing serving systems fail to adapt to the scaling behaviors of these algorithms or the varying difficulty of queries, leading to inefficient resource use and unmet latency targets. We present Dynasor, a system that optimizes inference-time compute for LLM reasoning queries. Unlike traditional engines, Dynasor tracks and schedules requests within reasoning queries and uses Certaindex, a proxy that measures statistical reasoning progress based on model certainty, to guide compute allocation dynamically. Dynasor co-adapts scheduling with reasoning progress: it allocates more compute to hard queries, reduces compute for simpler ones, and terminates unpromising queries early, balancing accuracy, latency, and cost. On diverse datasets and algorithms, Dynasor reduces compute by up to 50% in batch processing and sustaining 3.3x higher query rates or 4.7x tighter latency SLOs in online serving.

Via

Access Paper or Ask Questions

Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model

Sep 23, 2022

Zhenting Qi, Ruike Zhu, Zheyu Fu, Wenhao Chai, Volodymyr Kindratenko

Figure 1 for Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model

Figure 2 for Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model

Figure 3 for Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model

Figure 4 for Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model

Abstract:Fight detection in videos is an emerging deep learning application with today's prevalence of surveillance systems and streaming media. Previous work has largely relied on action recognition techniques to tackle this problem. In this paper, we propose a simple but effective method that solves the task from a new perspective: we design the fight detection model as a composition of an action-aware feature extractor and an anomaly score generator. Also, considering that collecting frame-level labels for videos is too laborious, we design a weakly supervised two-stage training scheme, where we utilize multiple-instance-learning loss calculated on video-level labels to train the score generator, and adopt the self-training technique to further improve its performance. Extensive experiments on a publicly available large-scale dataset, UBI-Fights, demonstrate the effectiveness of our method, and the performance on the dataset exceeds several previous state-of-the-art approaches. Furthermore, we collect a new dataset, VFD-2000, that specializes in video fight detection, with a larger scale and more scenarios than existing datasets. The implementation of our method and the proposed dataset will be publicly available at https://github.com/Hepta-Col/VideoFightDetection.

* Accepted by ICTAI 2022

Via

Access Paper or Ask Questions