Semantic communication, augmented by knowledge bases (KBs), offers substantial reductions in transmission overhead and resilience to errors. However, existing methods predominantly rely on end-to-end training to construct KBs, often failing to fully capitalize on the rich information available at communication devices. Motivated by the growing convergence of sensing and communication, we introduce a novel Position-Aided Semantic Communication (PASC) framework, which integrates localization into semantic transmission. This framework is particularly designed for position-based image communication, such as real-time uploading of outdoor camera-view images. By utilizing the position, the framework retrieves corresponding maps, and then an advanced foundation model (FM)-driven view generator is employed to synthesize images closely resembling the target images. The PASC framework further leverages the FM to fuse the synthesized image with deviations from the real one, enhancing semantic reconstruction. Notably, the framework is highly flexible, capable of adapting to dynamic content and fluctuating channel conditions through a novel FM-based parameter optimization strategy. Additionally, the challenges of real-time deployment are addressed, with the development of a hardware testbed to validate the framework. Simulations and real-world tests demonstrate that the proposed PASC approach not only significantly boosts transmission efficiency, but also remains robust in diverse and evolving transmission scenarios.