Abstract:Over the past year, Speculative Decoding has gained popularity as a technique for accelerating Large Language Model inference. While several methods have been introduced, most struggle to deliver satisfactory performance at batch sizes typical for data centers ($\geq 8$) and often involve significant deployment complexities. In this work, we offer a theoretical explanation of how Speculative Decoding can be effectively utilized with larger batch sizes. We also introduce a method that integrates seamlessly into existing systems without additional training or the complexity of deploying a small LLM. In a continuous batching setting, we achieve a 4x increase in throughput without any latency impact for short context generation, and a 1.7-2x improvement in both latency and throughput for longer contexts.
Abstract:In massive multiple-input multiple-output (MIMO) systems, the downlink transmission performance heavily relies on accurate channel state information (CSI). Constrained by the transmitted power, user equipment always transmits sounding reference signals (SRSs) to the base station through frequency hopping, which will be leveraged to estimate uplink CSI and subsequently predict downlink CSI. This paper aims to investigate joint channel estimation and prediction (JCEP) for massive MIMO with frequency hopping sounding (FHS). Specifically, we present a multiple-subband (MS) delay-angle-Doppler (DAD) domain channel model with off-grid basis to tackle the energy leakage problem. Furthermore, we formulate the JCEP problem with FHS as a multiple measurement vector (MMV) problem, facilitating the sharing of common CSI across different subbands. To solve this problem, we propose an efficient Off-Grid-MS hybrid message passing (HMP) algorithm under the constrained Bethe free energy (BFE) framework. Aiming to address the lack of prior CSI in practical scenarios, the proposed algorithm can adaptively learn the hyper-parameters of the channel by minimizing the corresponding terms in the BFE expression. To alleviate the complexity of channel hyper-parameter learning, we leverage the approximations of the off-grid matrices to simplify the off-grid hyper-parameter estimation. Numerical results illustrate that the proposed algorithm can effectively mitigate the energy leakage issue and exploit the common CSI across different subbands, acquiring more accurate CSI compared to state-of-the-art counterparts.
Abstract:Vessel segmentation in fundus is a key diagnostic capability in ophthalmology, and there are various challenges remained in this essential task. Early approaches indicate that it is often difficult to obtain desirable segmentation performance on thin vessels and boundary areas due to the imbalance of vessel pixels with different thickness levels. In this paper, we propose a novel two-stream Meticulous-Processing Network (MP-Net) for tackling this problem. To pay more attention to the thin vessels and boundary areas, we firstly propose an efficient hierarchical model automatically stratifies the ground-truth masks into different thickness levels. Then a novel two-stream adversarial network is introduced to use the stratification results with a balanced loss function and an integration operation to achieve a better performance, especially in thin vessels and boundary areas detecting. Our model is proved to outperform state-of-the-art methods on DRIVE, STARE, and CHASE_DB1 datasets.