Understanding information cascades in networks is a fundamental issue in numerous applications. Current researches often sample cascade information into several independent paths or subgraphs to learn a simple cascade representation. However, these approaches fail to exploit the hierarchical semantic associations between different modalities, limiting their predictive performance. In this work, we propose a novel Hierarchical Information Enhancement Network (HIENet) for cascade prediction. Our approach integrates fundamental cascade sequence, user social graphs, and sub-cascade graph into a unified framework. Specifically, HIENet utilizes DeepWalk to sample cascades information into a series of sequences. It then gathers path information between users to extract the social relationships of propagators. Additionally, we employ a time-stamped graph convolutional network to aggregate sub-cascade graph information effectively. Ultimately, we introduce a Multi-modal Cascade Transformer to powerfully fuse these clues, providing a comprehensive understanding of cascading process. Extensive experiments have demonstrated the effectiveness of the proposed method.