Transformers have achieved significant success in medical image segmentation, owing to its capability to capture long-range dependencies. Previous works incorporate convolutional layers into the encoder module of transformers, thereby enhancing their ability to learn local relationships among pixels. However, transformers may suffer from limited generalization capabilities and reduced robustness, attributed to the insufficient spatial recovery ability of their decoders. To address this issue, A convolution sparse vector coding based decoder is proposed , namely CAScaded multi-layer Convolutional Sparse vector Coding DEcoder (CASCSCDE), which represents features extracted by the encoder using sparse vectors. To prove the effectiveness of our CASCSCDE, The widely-used TransUNet model is chosen for the demonstration purpose, and the CASCSCDE is incorporated with TransUNet to establish the TransCASCSCDE architecture. Our experiments demonstrate that TransUNet with CASCSCDE significantly enhances performance on the Synapse benchmark, obtaining up to 3.15\% and 1.16\% improvements in DICE and mIoU scores, respectively. CASCSCDE opens new ways for constructing decoders based on convolutional sparse vector coding.