We investigate joint localization and synchronization in the downlink of a distributed multiple-input-multiple-output (D-MIMO) system, aiming to estimate the position and phase offset of a single-antenna user equipment (UE) using downlink transmissions of multiple phase-synchronized, multi-antenna access points (APs). We propose two transmission protocols: sequential (P1) and simultaneous (P2) AP transmissions, together with the ML estimators that either leverage (coherent estimator) or disregard phase information (non-coherent estimator). Simulation results reveal that downlink D-MIMO holds significant potential for high-accuracy localization while showing that P2 provides superior localization performance and reduced transmission latency.