We study collaborative machine learning (ML) systems where a massive dataset is distributed across independent workers which compute their local gradient estimates based on their own datasets. Workers send their estimates through a multipath fading multiple access channel (MAC) with orthogonal frequency division multiplexing (OFDM) to mitigate the frequency selectivity of the channel. We assume that the parameter server (PS) employs multiple antennas to align the received signals with no channel state information (CSI) at the workers. To reduce the power consumption and the hardware costs, we employ complex-valued low resolution digital to analog converters (DACs) and analog to digital converters (ADCs), respectively, at the transmitter and the receiver sides to study the effects of practical low cost DACs and ADCs on the learning performance of the system. Our theoretical analysis shows that the impairments caused by low-resolution DACs and ADCs, including the extreme case of one-bit DACs and ADCs, do not prevent the convergence of the learning algorithm, and the multipath channel effects vanish when a sufficient number of antennas are used at the PS. We also validate our theoretical results via simulations, and demonstrate that using low-resolution, even one-bit, DACs and ADCs causes only a slight decrease in the learning accuracy.