Accurate localization is crucial for various applications, including autonomous vehicles and next-generation wireless networks. However, the reliability and precision of Global Navigation Satellite Systems (GNSS), such as the Global Positioning System (GPS), are compromised by multi-path errors and non-line-of-sight scenarios. This paper presents a novel approach to enhance GPS accuracy by combining visual data from RGB cameras with wireless signals captured at millimeter-wave (mmWave) and sub-terahertz (sub-THz) basestations. We propose a sensing-aided framework for (i) site-specific GPS data characterization and (ii) GPS position de-noising that utilizes multi-modal visual and wireless information. Our approach is validated in a realistic Vehicle-to-Infrastructure (V2I) scenario using a comprehensive real-world dataset, demonstrating a substantial reduction in localization error to sub-meter levels. This method represents a significant advancement in achieving precise localization, particularly beneficial for high-mobility applications in 5G and beyond networks.