Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hailin Zhang

SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling

May 30, 2025

Xiaodong Ji, Hailin Zhang, Fangcheng Fu, Bin Cui

Abstract:Many advanced Large Language Model (LLM) applications require long-context processing, but the self-attention module becomes a bottleneck during the prefilling stage of inference due to its quadratic time complexity with respect to sequence length. Existing sparse attention methods accelerate attention computation by skipping less significant regions of the attention map. However, these approaches typically perform coarse-grained inspection of the attention map, rendering considerable loss in model accuracy. In this paper, we propose SALE, a fine-grained sparse attention method that accelerates the long-context prefilling stage of LLM with negligible loss in model accuracy. SALE achieves fast and accurate fine-grained attention weight estimation through 4-bit quantized query-key products, followed by block-sparse attention to accelerate prefilling computations. For importance evaluation for query-key pairs, we adopt our Relative Attention Score metric, which offers significantly higher efficiency within our framework. We implement a custom CUDA kernel optimized for our approach for hardware efficiency, reducing the additional overhead to approximately 11% of the full attention latency. Notably, SALE requires no parameter training and can be seamlessly integrated into existing systems with trivial code modifications. Experiments on long-context benchmarks demonstrate that our method outperforms existing approaches in accuracy-efficiency trade-offs, achieving at least 3.36x speedups on Llama-3.1-8B for sequences longer than 64K while maintaining model quality.

Via

Access Paper or Ask Questions

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

May 12, 2025

Xiaomi LLM-Core Team, :, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu(+55 more)

Abstract:We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective for enhanced performance and accelerated inference speed. During post-training, we curate a dataset of 130K verifiable mathematics and programming problems for reinforcement learning, integrating a test-difficulty-driven code-reward scheme to alleviate sparse-reward issues and employing strategic data resampling to stabilize training. Extensive evaluations show that MiMo-7B-Base possesses exceptional reasoning potential, outperforming even much larger 32B models. The final RL-tuned model, MiMo-7B-RL, achieves superior performance on mathematics, code and general reasoning tasks, surpassing the performance of OpenAI o1-mini. The model checkpoints are available at https://github.com/xiaomimimo/MiMo.

Via

Access Paper or Ask Questions

Orbital-Angular-Momentum Versus MIMO: Orthogonality, Degree of Freedom,and Capacity

Aug 13, 2024

Haiyue Jing, Wenchi Cheng, Xiang-Gen Xia, Hailin Zhang

Figure 1 for Orbital-Angular-Momentum Versus MIMO: Orthogonality, Degree of Freedom,and Capacity

Figure 2 for Orbital-Angular-Momentum Versus MIMO: Orthogonality, Degree of Freedom,and Capacity

Figure 3 for Orbital-Angular-Momentum Versus MIMO: Orthogonality, Degree of Freedom,and Capacity

Figure 4 for Orbital-Angular-Momentum Versus MIMO: Orthogonality, Degree of Freedom,and Capacity

Abstract:The plane wave based wireless communications have becoming more and more matured, along with the well utilization of the traditional resources such as time and frequency. To further increase the capacity for rapidly increasing capacity demand of wireless communications, it is potential to use the twist wave, which has the orbital angular momentum (OAM). In this paper, we discuss the OAM based wireless communications in the aspect of orthogonality, degree of freedom (DoF), and capacity, where both the transmitter and the receiver use uniform circular array (UCA) antennas. In particular, we compare OAM based wireless communications with multiple-input-multiple-output (MIMO) based wireless communications in terms of DoF and capacity. Numerical results are presented to validate and evaluate that the DoF of OAM based wireless communications is greater than or equal to that of correlated MIMO based wireless communications when the transmitter and the receiver antennas are aligned well. The OAM based wireless communications can achieve larger capacity than the correlated MIMO in high signal-to-noise ratio (SNR) region under line-of-sight scenario.

Via

Access Paper or Ask Questions

Achieving Practical OAM Based Wireless Communications With Misaligned Transceiver

Aug 13, 2024

Wenchi Cheng, Haiyue Jing, Wei Zhang, Zan Li, Hailin Zhang

Abstract:Orbital angular momentum (OAM) has attracted much attention for radio vortex wireless communications due to the orthogonality among different OAM-modes. To maintain the orthogonality among different OAM modes at the receiver, the strict alignment between transmit and receive antennas is highly demanded. However, it is not practical to guarantee the transceiver alignment in wireless communications. The phase turbulence, resulting from the misaligned transceivers, leads to serious inter-mode interference among different OAM modes and therefore fail for signals detection of multiple OAM modes at the receiver. To achieve practical OAM based wireless communications, in this paper we investigate the radio vortex wireless communications with misaligned transmit and receive antennas. We propose a joint Beamforming and Pre-detection (BePre) scheme, which uses two unitary matrices to convert the channel matrix into the equivalent circulant matrix for keeping the orthogonality among OAM-modes at the receiver. Then, the OAM signals can be detected with the mode-decomposition scheme at the misaligned receiver. Extensive simulations obtained validate and evaluate that our developed joint BePre scheme can efficiently detect the signals of multiple OAM-modes for the misaligned transceiver and can significantly increase the spectrum efficiency.

Via

Access Paper or Ask Questions

Orbital-Angular-Momentum Embedded Massive MIMO: Achieving Multiplicative Spectrum-Efficiency for mmWave Communications

Aug 13, 2024

Wenchi Cheng, Hailin Zhang, Liping Liang, Haiyue Jing, Zan Li

Figure 1 for Orbital-Angular-Momentum Embedded Massive MIMO: Achieving Multiplicative Spectrum-Efficiency for mmWave Communications

Figure 2 for Orbital-Angular-Momentum Embedded Massive MIMO: Achieving Multiplicative Spectrum-Efficiency for mmWave Communications

Figure 3 for Orbital-Angular-Momentum Embedded Massive MIMO: Achieving Multiplicative Spectrum-Efficiency for mmWave Communications

Figure 4 for Orbital-Angular-Momentum Embedded Massive MIMO: Achieving Multiplicative Spectrum-Efficiency for mmWave Communications

Abstract:By enabling very high bandwidth for radio communications, the millimeter-wave (mmWave), which can easily be integrated with massive-multiple-input-multiple-output (massive-MIMO) due to small antenna size, has been attracting growing attention as a candidate for the fifth-generation (5G) and 5G-beyond wireless communications networks. On the other hand, the communication over the orthogonal states/modes of orbital angular momentum (OAM) is a subset of the solutions offered by massive-MIMO communications. Traditional massive-MIMO based mmWave communications did not concern the potential spectrum-efficiency-gain (SE-gain) offered by orthogonal states of OAM. However, the highly expecting maximum SE-gain for OAM and massive-MIMO communications is the product of SE-gains offered by OAM and multiplexing-MIMO. In this paper, we propose the OAM-embedded-MIMO (OEM) communication framework to obtain the multiplicative SE-gain for joint OAM and massive-MIMO based mmWave wireless communications. We design the parabolic antenna for each uniform circular array antenna to converge OAM signals. Then, we develop the mode-decomposition and multiplexing-detection scheme to obtain the transmit signal on each OAM-mode of each transmit antenna. Also, we develop the OEM-water-filling power allocation policy to achieve the maximum multiplicative SE-gain for OEM communications. The extensive simulations obtained validate and evaluate our developed parabolic antenna based converging method, mode-decomposition and multiplexing-detection scheme, and OEM-water-filling policy, showing that our proposed OEM mmWave communications can significantly increase the spectrum-efficiency as compared with traditional massive-MIMO based mmWave communications.

Via

Access Paper or Ask Questions

Quasi-Fractal UCA Based OAM for Highly Efficient Orthogonal Transmission

Aug 10, 2024

Wenchi Cheng, Haiyue Jing, Wei Zhang, Keyi Zhang, Hailin Zhang

Figure 1 for Quasi-Fractal UCA Based OAM for Highly Efficient Orthogonal Transmission

Figure 2 for Quasi-Fractal UCA Based OAM for Highly Efficient Orthogonal Transmission

Figure 3 for Quasi-Fractal UCA Based OAM for Highly Efficient Orthogonal Transmission

Figure 4 for Quasi-Fractal UCA Based OAM for Highly Efficient Orthogonal Transmission

Abstract:The development of orbital angular momentum (OAM)-based radio vortex transmission presents a promising opportunity for increasing the capacity of wireless communication in correlated channels due to its inherent orthogonality among different OAM modes. One of the most popular schemes for high-efficient OAM transmission is the digital baseband associated with uniform circular array (UCA) based transceiver. However, the periodicity of complex-exponential feed makes the maximum number of orthogonal signals carried by multiple OAM modes generally restricted to the array-element number of UCA antenna, which poses an open question of how to employ more OAM modes given a fixed number of array elements. Furthermore, signals modulated with high-order OAM modes are difficult to be captured by the receiver due to their serious divergence as propagating in free space, thus severely limiting the capacity of radio vortex communications. To overcome the above challenges, in this paper based on the partly element-overlapped fractal geometry layout and effectively using low-order OAM modes, we propose the quasi-fractal UCA (QF-UCA) antenna based OAM multiplexing transmission. We perform the two-dimension OAM modulation (TOM) and demodulation (TOD) schemes with the orthogonal OAM mode number exceeding the array-element number, which is beyond the traditional concept of multiple antennas based wireless communications. Simulation results show that our proposed scheme can achieve more number of orthogonal multiplexing streams than the maximum number of orthogonal multiplexing corresponding to traditional multiple antenna systems.

Via

Access Paper or Ask Questions

The Rise of UAV Fleet Technologies for Emergency Wireless Communications in Harsh Environments

Jul 24, 2024

Zhuohui Yao, Wenchi Cheng, Wei Zhang, Tao Zhang, Hailin Zhang

Abstract:For unforeseen emergencies, such as natural disasters and pandemic events, it is highly demanded to cope with the explosive growth of mobile data traffic in extremely critical environments. An Unmanned aerial vehicle (UAV) fleet is an effective way to facilitate the Emergency wireless COmmunication NETwork (EcoNet). In this article, a MUlti-tier Heterogeneous UAV Network (MuHun), which is with different UAV fleets in different altitudes, is proposed to flexibly serve various emergencies. We refresh the key performance indicators of full coverage, network capacity, low latency, and energy efficiency in harsh environments. Then, we present the special challenges regarding shadowing-dominated complex channel model, energy supply limited short-endurance, various communication mechanisms coexistence, and communication island for underground users in UAV-based EcoNet, followed by the MuHun-based EcoNet architecture and its advantages. Furthermore, some potential solutions such as the new hybrid-channel adapted resource allocation, reconfigurable intelligent surface assisted UAV communications, competitive heterogenous-networks, and magnetic induction based air-to-ground/underground communications are discussed to effectively achieve full coverage, high capacity, high energy efficiency, and diverse qualities of services for EcoNets in harsh environments.

Via

Access Paper or Ask Questions

Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Jul 24, 2024

Zhuohui Yao, Wenchi Cheng, Wei Zhang, Hailin Zhang

Figure 1 for Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Figure 2 for Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Figure 3 for Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Figure 4 for Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Abstract:For unforeseen natural disasters, such as earthquakes, hurricanes, and floods, etc., the traditional communication infrastructure is unavailable or seriously disrupted along with persistent secondary disasters. Under such circumstances, it is highly demanded to deploy emergency wireless communication (EWC) networks to restore connectivity in accident/incident areas. The emerging fifth-generation (5G)/beyond-5G (B5G) wireless communication system, like unmanned aerial vehicle (UAV) assisted networks and intelligent reflecting surface (IRS) based communication systems, are expected to be designed or re-farmed for supporting temporary high quality communications in post-disaster areas. However, the channel characteristics of post-disaster areas quickly change as the secondary disaster resulted topographical changes, imposing new but critical challenges for EWC networks. In this paper, we propose a novel heterogeneous $\mathcal{F}$ composite fading channel model for EWC networks which accurately models and characterizes the composite fading channel with reflectors, path-loss exponent, fading, and shadowing parameters in 5G-UAV based EWC networks. Based on the model, we develop the optimal power allocation scheme with the simple closed-form expression and the numerical results based optimal joint bandwidth-power allocation scheme. We derive the corresponding capacities and compare the energy efficiency between IRS and traditional relay based 5G-UAVs. Numerical results show that the new heterogeneous Fisher-Snedecor $\mathcal{F}$ composite fading channel adapted resource allocation schemes can achieve higher capacity and energy efficiency than those of traditional channel model adapted resource allocation schemes, thus providing better communications service for post-disaster areas.

Via

Access Paper or Ask Questions

Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Jul 24, 2024

Jianyu Wang, Wenchi Cheng, Wei Zhang, Hailin Zhang

Figure 1 for Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Figure 2 for Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Figure 3 for Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Figure 4 for Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Abstract:In this paper, we propose a virtual full-duplex (VFD) technique with zero-interval modulation and sampling (ZIMS), where two half-duplex (HD) transceivers can simultaneously transmit signals and each transceiver can effectively receive the desired information. In ZIMS-VFD, the transceiver inserts a zero-interval for each symbol in the transmit signal and provides self-interference (SI)-free intervals for itself. Meanwhile, it samples the receive signal in the provided SI-free intervals and restores the desired symbols. Based on orthogonal frequency division multiplexing (OFDM), we formulate the system model and show the transmit signal structure. Then, we give the transceiver design for single input single output (SISO) ZIMS-VFD and extend it to multiple input multiple output (MIMO) communications. Numerical results verify our theoretical analyses and show that ZIMS-VFD can effectively increase the capacity and approach the FD without SI.

Via

Access Paper or Ask Questions

Mode Hopping for Anti-Jamming in Radio Vortex Wireless Communications

Jul 18, 2024

Liping Liang, Wenchi Cheng, Wei Zhang, Hailin Zhang

Figure 1 for Mode Hopping for Anti-Jamming in Radio Vortex Wireless Communications

Figure 2 for Mode Hopping for Anti-Jamming in Radio Vortex Wireless Communications

Figure 3 for Mode Hopping for Anti-Jamming in Radio Vortex Wireless Communications

Figure 4 for Mode Hopping for Anti-Jamming in Radio Vortex Wireless Communications

Abstract:Frequency hopping (FH) has been widely used as a powerful technique for antijamming in wireless communications. However, as the wireless spectrum is becoming more and more crowded, it is very difficult to achieve efficient antijamming results with FH-based schemes. Orbital angular momentum (OAM), which provides the new angular/mode dimension for wireless communications, offers an intriguing way for antijamming. In this paper, we propose to use the orthogonality of OAM-modes for antijamming in wireless communications. In particular, we propose the mode hopping (MH) scheme for antijamming within the narrow frequency band. We derive the closed-form expression of bit error rate (BER) for multiple users scenario with our developed MH scheme. Our developed MH scheme can achieve the same antijamming results within the narrow frequency band as compared with the conventional wideband FH scheme. Furthermore, we propose mode-frequency hopping (MFH) scheme, which jointly uses our developed MH scheme and the conventional FH scheme to further decrease the BER for wireless communication. Numerical results are presented to show that the BER of our developed MH scheme within the narrow frequency band is the same with that of the conventional wideband FH scheme. Moreover, the BER of our developed MFH schemes is much smaller than that of the conventional FH schemes for wireless communications.

* 15 pages, accepted by IEEE Transactions on Vehicular Technology (Volume: 67, Issue: 8, August 2018)

Via

Access Paper or Ask Questions