Physical adversarial attacks have put a severe threat to DNN-based object detectors. To enhance security, a combination of visible and infrared sensors is deployed in various scenarios, which has proven effective in disabling existing single-modal physical attacks. To further demonstrate the potential risks in such cases, we design a unified adversarial patch that can perform cross-modal physical attacks, achieving evasion in both modalities simultaneously with a single patch. Given the different imaging mechanisms of visible and infrared sensors, our work manipulates patches' shape features, which can be captured in different modalities when they undergo changes. To deal with challenges, we propose a novel boundary-limited shape optimization approach that aims to achieve compact and smooth shapes for the adversarial patch, making it easy to implement in the physical world. And a score-aware iterative evaluation method is also introduced to balance the fooling degree between visible and infrared detectors during optimization, which guides the adversarial patch to iteratively reduce the predicted scores of the multi-modal sensors. Furthermore, we propose an Affine-Transformation-based enhancement strategy that makes the learnable shape robust to various angles, thus mitigating the issue of shape deformation caused by different shooting angles in the real world. Our method is evaluated against several state-of-the-art object detectors, achieving an Attack Success Rate (ASR) of over 80%. We also demonstrate the effectiveness of our approach in physical-world scenarios under various settings, including different angles, distances, postures, and scenes for both visible and infrared sensors.