Decentralized federated learning (DFL), inherited from distributed optimization, is an emerging paradigm to leverage the explosively growing data from wireless devices in a fully distributed manner.DFL enables joint training of machine learning model under device to device (D2D) communication fashion without the coordination of a parameter server. However, the deployment of wireless DFL is facing some pivotal challenges. Communication is a critical bottleneck due to the required extensive message exchange between neighbor devices to share the learned model. Besides, consensus becomes increasingly difficult as the number of devices grows because there is no available central server to perform coordination. To overcome these difficulties, this paper proposes employing over-the-air computation (Aircomp) to improve communication efficiency by exploiting the superposition property of analog waveform in multi-access channels, and introduce the mixing matrix mechanism to promote consensus using the spectral property of symmetric doubly stochastic matrix. Specifically, we develop a novel multiple-input multiple-output over-the-air DFL (MIMO OA-DFL) framework to study over-the-air DFL problem over MIMO multiple access channels. We conduct a general convergence analysis to quantitatively capture the influence of aggregation weight and communication error on the MIMO OA-DFL performance in \emph{ad hoc} networks. The result shows that the communication error together with the spectral gap of mixing matrix has a significant impact on the learning performance. Based on this, a joint communication-learning optimization problem is formulated to optimize transceiver beamformers and mixing matrix. Extensive numerical experiments are performed to reveal the characteristics of different topologies and demonstrate the substantial learning performance enhancement of our proposed algorithm.