The 6G mobile networks are differentiated from 5G by two new usage scenarios - distributed sensing and edge AI. Their natural integration, termed integrated sensing and edge AI (ISEA), promised to create a platform for enabling environment perception to make intelligent decisions and take real-time actions. A basic operation in ISEA is for a fusion center to acquire and fuse features of spatial sensing data distributed at many agents. To overcome its communication bottleneck due to multiple access by numerous agents over hostile wireless channels, we propose a novel framework, called Spatial Over-the-Air Fusion (Spatial AirFusion), which exploits radio waveform superposition to aggregate spatially sparse features over the air. The technology is more sophisticated than conventional Over-the-Air Computing (AirComp) as it supports simultaneous aggregation over multiple voxels, which partition the 3D sensing region, and across multiple subcarriers. Its efficiency and robustness are derived from exploitation of both spatial feature sparsity and multiuser channel diversity to intelligently pair voxel-level aggregation tasks and subcarriers to maximize the minimum receive SNR among voxels under instantaneous power constraints. To optimally solve the mixed-integer Voxel-Carrier Pairing and Power Allocation (VoCa-PPA) problem, the proposed approach hinges on two useful results: (1) deriving the optimal power allocation as a closed-form function of voxel-carrier pairing and (2) discovering a useful property of VoCa-PPA that dramatically reduces the solution-space dimensionality. Both a low-complexity greedy algorithm and an optimal tree-search based approach are designed for VoCa-PPA. Extensive simulations using real datasets show that Spatial AirFusion achieves significant error reduction and accuracy improvement compared with conventional AirComp without awareness of spatial sparsity.