The flying ad hoc network (FANET) will play a crucial role in the B5G/6G era since it provides wide coverage and on-demand deployment services in a distributed manner. The detection of Sybil attacks is essential to ensure trusted communication in FANET. Nevertheless, the conventional methods only utilize the untrusted information that UAV nodes passively ``heard'' from the ``auditory" domain (AD), resulting in severe communication disruptions and even collision accidents. In this paper, we present a novel VA-matching solution that matches the neighbors observed from both the AD and the ``visual'' domain (VD), which is the first solution that enables UAVs to accurately correlate what they ``see'' from VD and ``hear'' from AD to detect the Sybil attacks. Relative entropy is utilized to describe the similarity of observed characteristics from dual domains. The dynamic weight algorithm is proposed to distinguish neighbors according to the characteristics' popularity. The matching model of neighbors observed from AD and VD is established and solved by the vampire bat optimizer. Experiment results show that the proposed VA-matching solution removes the unreliability of individual characteristics and single domains. It significantly outperforms the conventional RSSI-based method in detecting Sybil attacks. Furthermore, it has strong robustness and achieves high precision and recall rates.