Airborne acquisition and on-road mobile mapping provide complementary 3D information of an urban landscape: the former acquires roof structures, ground, and vegetation at a large scale, but lacks the facade and street-side details, while the latter is incomplete for higher floors and often totally misses out on pedestrian-only areas or undriven districts. In this work, we introduce an approach that efficiently unifies a detailed street-side Structure-from-Motion (SfM) or Multi-View Stereo (MVS) point cloud and a coarser but more complete point cloud from airborne acquisition in a joint surface mesh. We propose a point cloud blending and a volumetric fusion based on ray casting across a 3D tetrahedralization (3DT), extended with data reduction techniques to handle large datasets. To the best of our knowledge, we are the first to adopt a 3DT approach for airborne/street-side data fusion. Our pipeline exploits typical characteristics of airborne and ground data, and produces a seamless, watertight mesh that is both complete and detailed. Experiments on 3D urban data from multiple sources and different data densities show the effectiveness and benefits of our approach.