The pixel-wise dense prediction tasks based on weakly supervisions currently use Class Attention Maps (CAM) to generate pseudo masks as ground-truth. However, the existing methods typically depend on the painstaking training modules, which may bring in grinding computational overhead and complex training procedures. In this work, the semantic structure aware inference (SSA) is proposed to explore the semantic structure information hidden in different stages of the CNN-based network to generate high-quality CAM in the model inference. Specifically, the semantic structure modeling module (SSM) is first proposed to generate the class-agnostic semantic correlation representation, where each item denotes the affinity degree between one category of objects and all the others. Then the structured feature representation is explored to polish an immature CAM via the dot product operation. Finally, the polished CAMs from different backbone stages are fused as the output. The proposed method has the advantage of no parameters and does not need to be trained. Therefore, it can be applied to a wide range of weakly-supervised pixel-wise dense prediction tasks. Experimental results on both weakly-supervised object localization and weakly-supervised semantic segmentation tasks demonstrate the effectiveness of the proposed method, which achieves the new state-of-the-art results on these two tasks.