We present a universal framework for learning the behavior of dynamical systems from observations. We formulate the learning task as a constrained optimization problem which can be efficiently solved with the adjoint sensitivity method. Our scheme is flexible with regards to the choice of model, and existing knowledge can be readily incorporated for hybrid learning. We demonstrate the effectiveness of our scheme by learning a variety of systems including a stiff Van der Pol oscillator, a chaotic Lorenz system, and the Kuramoto-Sivashinsky equation. We also include examples of hybrid learning and learning from noisy observations.