Soft robotics is a modern robotic paradigm for performing dexterous interactions with the surroundings via morphological flexibility. The desire for autonomous operation requires soft robots to be capable of proprioception and makes it necessary to devise a calibration process. These requirements can be greatly benefited by adopting numerical simulation for computational efficiency. However, the gap between the simulated and real domains limits the accurate, generalized application of the approach. Herein, we propose an unsupervised domain adaptation framework as a data-efficient, generalized alignment of these heterogeneous sensor domains. A dual cross-modal autoencoder was designed to match the sensor domains at a feature level without any extensive labeling process, facilitating the computationally efficient transferability to various tasks. As a proof-of-concept, the methodology was adopted to the famous soft robot design, a multigait soft robot, and two fundamental perception tasks for autonomous robot operation, involving high-fidelity shape estimation and collision detection. The resulting perception demonstrates the digital-twinned calibration process in both the simulated and real domains. The proposed design outperforms the existing prevalent benchmarks for both perception tasks. This unsupervised framework envisions a new approach to imparting embodied intelligence to soft robotic systems via blending simulation.