Empathy is a trait that naturally manifests in human conversation. Theoretically, the birth of empathetic responses results from conscious alignment and interaction between cognition and affection of empathy. However, existing works rely solely on a single affective aspect or model cognition and affection independently, limiting the empathetic capabilities of the generated responses. To this end, based on the commonsense cognition graph and emotional concept graph constructed involving commonsense and concept knowledge, we design a two-level strategy to align coarse-grained (between contextual cognition and contextual emotional state) and fine-grained (between each specific cognition and corresponding emotional reaction) Cognition and Affection for reSponding Empathetically (CASE). Extensive experiments demonstrate that CASE outperforms the state-of-the-art baselines on automatic and human evaluation. Our code will be released.