The next-generation (6G) wireless networks are expected to provide not only seamless and high data-rate communications, but also ubiquitous sensing services. By providing vast spatial degrees of freedom (DoFs), ultra-massive multiple-input multiple-output (UM-MIMO) technology is a key enabler for both sensing and communications in 6G. However, the adoption of UM-MIMO leads to a shift from the far field to the near field in terms of the electromagnetic propagation, which poses novel challenges in system design. Specifically, near-field effects introduce highly non-linear spherical wave models that render existing designs based on plane wave assumptions ineffective. In this paper, we focus on two crucial tasks in sensing and communications, respectively, i.e., localization and channel estimation, and investigate their joint design by exploring the near-field propagation characteristics, achieving mutual benefits between two tasks. In addition, multiple base stations (BSs) are leveraged to collaboratively facilitate a cooperative localization framework. To address the joint channel estimation and cooperative localization problem for near-field UM-MIMO systems, we propose a variational Newtonized near-field channel estimation (VNNCE) algorithm and a Gaussian fusion cooperative localization (GFCL) algorithm. The VNNCE algorithm exploits the spatial DoFs provided by the near-field channel to obtain position-related soft information, while the GFCL algorithm fuses this soft information to achieve more accurate localization. Additionally, we introduce a joint architecture that seamlessly integrates channel estimation and cooperative localization.