Abstract:Nowadays, many companies possess various types of AI accelerators, forming heterogeneous clusters. Efficiently leveraging these clusters for high-throughput large language model (LLM) inference services can significantly reduce costs and expedite task processing. However, LLM inference on heterogeneous clusters presents two main challenges. Firstly, different deployment configurations can result in vastly different performance. The number of possible configurations is large, and evaluating the effectiveness of a specific setup is complex. Thus, finding an optimal configuration is not an easy task. Secondly, LLM inference instances within a heterogeneous cluster possess varying processing capacities, leading to different processing speeds for handling inference requests. Evaluating these capacities and designing a request scheduling algorithm that fully maximizes the potential of each instance is challenging. In this paper, we propose a high-throughput inference service system on heterogeneous clusters. First, the deployment configuration is optimized by modeling the resource amount and expected throughput and using the exhaustive search method. Second, a novel mechanism is proposed to schedule requests among instances, which fully considers the different processing capabilities of various instances. Extensive experiments show that the proposed scheduler improves throughput by 122.5% and 33.6% on two heterogeneous clusters, respectively.
Abstract:The quality of quantitative differential phase contrast reconstruction (qDPC) can be severely degenerated by the mismatch of the background of two oblique illuminated images, yielding problematic phase recovery results. These background mismatches may result from illumination patterns, inhomogeneous media distribution, or other defocusing layers. In previous reports, the background is manually calibrated which is time-consuming, and unstable, since new calibrations are needed if any modification to the optical system was made. It is also impossible to calibrate the background from the defocusing layers, or for high dynamic observation as the background changes over time. To tackle the mismatch of background and increases the experimental robustness, we propose the Retinex-qDPC in which we use the images edge features as data fidelity term yielding L2-Retinex-qDPC and L1-Retinex-qDPC for high background-robustness qDPC reconstruction. The split Bregman method is used to solve the L1-Retinex DPC. We compare both Retinex-qDPC models against state-of-the-art DPC reconstruction algorithms including total-variation regularized qDPC, and isotropic-qDPC using both simulated and experimental data. Results show that the Retinex qDPC can significantly improve the phase recovery quality by suppressing the impact of mismatch background. Within, the L1-Retinex-qDPC is better than L2-Retinex and other state-of-the-art DPC algorithms. In general, the Retinex-qDPC increases the experimental robustness against background illumination without any modification of the optical system, which will benefit all qDPC applications.