Event recognition from still images is of great importance for image understanding. However, compared with event recognition in videos, there are much fewer research works on event recognition in images. This paper addresses the issue of event recognition from images and proposes an effective method with deep neural networks. Specifically, we design a new architecture, called Object-Scene Convolutional Neural Network (OS-CNN). This architecture is decomposed into object net and scene net, which extract useful information for event understanding from the perspective of objects and scene context, respectively. Meanwhile, we investigate different network architectures for OS-CNN design, and adapt the deep (AlexNet) and very-deep (GoogLeNet) networks to the task of event recognition. Furthermore, we find that the deep and very-deep networks are complementary to each other. Finally, based on the proposed OS-CNN and comparative study of different network architectures, we come up with a solution of five-stream CNN for the track of cultural event recognition at the ChaLearn Looking at People (LAP) challenge 2015. Our method obtains the performance of 85.5% and ranks the $1^{st}$ place in this challenge.