This letter investigates target position estimation in integrated sensing and communications (ISAC) networks composed of multiple cooperating monostatic base stations (BSs). Each BS employs a MIMO-orthogonal time-frequency space (OTFS) scheme, enabling the coexistence of communication and sensing. A general cooperative maximum likelihood (ML) framework is derived, directly estimating the target position in a common reference system rather than relying on local range and angle estimates at each BS. Positioning accuracy is evaluated in single-target scenarios by varying the number of collaborating BSs, using root mean square error (RMSE), and comparing against the Cram\'er-Rao lower bound. Numerical results demonstrate that the ML framework significantly reduces the position RMSE as the number of cooperating BSs increases.