We present a new application of deep learning to infer the masses of galaxy clusters directly from images of the microwave sky. Effectively, this is a novel approach to determining the scaling relation between a cluster's Sunyaev-Zel'dovich (SZ) effect signal and mass. The deep learning algorithm used is mResUNet, which is a modified feed-forward deep learning algorithm that broadly combines residual learning, convolution layers with different dilation rates, image regression activation and a U-Net framework. We train and test the deep learning model using simulated images of the microwave sky that include signals from the cosmic microwave background (CMB), dusty and radio galaxies, instrumental noise as well as the cluster's own SZ signal. The simulated cluster sample covers the mass range 1$\times 10^{14}~\rm M_{\odot}$ $<M_{200\rm c}<$ 8$\times 10^{14}~\rm M_{\odot}$ at $z=0.7$. The trained model estimates the cluster masses with a 1 $\sigma$ uncertainty $\Delta M/M \leq 0.2$, consistent with the input scatter on the SZ signal of 20%. We verify that the model works for realistic SZ profiles even when trained on azimuthally symmetric SZ profiles by using the Magneticum hydrodynamical simulations. We find the model returns unbiased mass estimates for the hydrodynamical simulations with a scatter consistent with the SZ-mass scatter in the light cones.