Numerical integrators could be used to form interpolation conditions when training neural networks to approximate the vector field of an ordinary differential equation (ODE) from data. When numerical one-step schemes such as the Runge-Kutta methods are used to approximate the temporal discretization of an ODE with a known vector field, properties such as symmetry and stability are much studied. Here, we show that using mono-implicit Runge-Kutta methods of high order allows for accurate training of Hamiltonian neural networks on small datasets. This is demonstrated by numerical experiments where the Hamiltonian of the chaotic double pendulum in addition to the Fermi-Pasta-Ulam-Tsingou system is learned from data.