With extremely high temporal resolution, event cameras have a large potential for robotics and computer vision. However, their asynchronous imaging mechanism often aggravates the measurement sensitivity to noises and brings a physical burden to increase the image spatial resolution. To recover high-quality intensity images, one should address both denoising and super-resolution problems for event cameras. Since events depict brightness changes, with the enhanced degeneration model by the events, the clear and sharp high-resolution latent images can be recovered from the noisy, blurry and low-resolution intensity observations. Exploiting the framework of sparse learning, the events and the low-resolution intensity observations can be jointly considered. Based on this, we propose an explainable network, an event-enhanced sparse learning network (eSL-Net), to recover the high-quality images from event cameras. After training with a synthetic dataset, the proposed eSL-Net can largely improve the performance of the state-of-the-art by 7-12 dB. Furthermore, without additional training process, the proposed eSL-Net can be easily extended to generate continuous frames with frame-rate as high as the events.