Multimode fiber (MMF) imaging aided by machine learning holds promise for numerous applications, including medical endoscopy. A key challenge for this technology is the sensitivity of modal transmission characteristics to environmental perturbations. Here, we show experimentally that an MMF imaging scheme based on a neural network (NN) can achieve results that are significantly robust to thermal perturbations. For example, natural images are successfully reconstructed as the MMF's temperature is varied by up to 50$^{\circ}$C relative to the training scenario, despite substantial variations in the speckle patterns caused by thermal changes. A dense NN with a single hidden layer is found to outperform a convolutional NN suitable for standard computer vision tasks. In addition, we demonstrate that NN parameters can be used to understand the MMF properties by reconstructing the approximate transmission matrices, and we show that the image reconstruction accuracy is directly related to the temperature dependence of the MMF's transmission characteristics.