Transformers have found broad applications for their great ability to capture long-range dependency among the inputs using attention mechanisms. The recent success of transformers increases the need for mathematical interpretation of their underlying working mechanisms, leading to the development of a family of white-box transformer-like deep network architectures. However, designing white-box transformers with efficient three-dimensional (3D) attention is still an open challenge. In this work, we revisit the 3D-orthogonal matching pursuit (OMP) algorithm and demonstrate that the operation of 3D-OMP is analogous to a specific kind of transformer with 3D attention. Therefore, we build a white-box 3D-OMP-transformer by introducing additional learnable parameters to 3D-OMP. As a transformer, its 3D-attention can be mathematically interpreted from 3D-OMP; while as a variant of OMP, it can learn to improve the matching pursuit process from data. Besides, a transformer's performance can be improved by stacking more transformer blocks. To simulate this process, we design a cascaded 3D-OMP-Transformer with dynamic small-scale dictionaries, which can improve the performance of the 3D-OMP-Transformer with low costs. We evaluate the designed 3D-OMP-transformer in the multi-target detection task of integrated sensing and communications (ISAC). Experimental results show that the designed 3D-OMP-Transformer can outperform current baselines.