We propose a method which can detect events in videos by modeling the change in appearance of the event participants over time. This method makes it possible to detect events which are characterized not by motion, but by the changing state of the people or objects involved. This is accomplished by using object detectors as output models for the states of a hidden Markov model (HMM). The method allows an HMM to model the sequence of poses of the event participants over time, and is effective for poses of humans and inanimate objects. The ability to use existing object-detection methods as part of an event model makes it possible to leverage ongoing work in the object-detection community. A novel training method uses an EM loop to simultaneously learn the temporal structure and object models automatically, without the need to specify either the individual poses to be modeled or the frames in which they occur. The E-step estimates the latent assignment of video frames to HMM states, while the M-step estimates both the HMM transition probabilities and state output models, including the object detectors, which are trained on the weighted subset of frames assigned to their state. A new dataset was gathered because little work has been done on events characterized by changing object pose, and suitable datasets are not available. Our method produced results superior to that of comparison systems on this dataset.