Compliant robotics have seen successful applications in energy efficient locomotion and cyclic manipulation. However, fully exploitation of variable physical impedance for energy efficient sequential movements has not been extensively addressed. This work employs a hierarchical approach to encapsulate low-level optimal control for sub-movement generation into an outer loop of iterative policy improvement, thereby benefits of both optimal control and reinforcement learning are leveraged. The framework enables optimizing efficiency trade-off for minimal energy expenses in a model-free manner, by taking account of cost function weighting, variable impedance exploitation, and transition timing, which are associated with the skill of compliance. The effectiveness of the proposed method is evaluated using two consecutive reaching tasks on a variable impedance actuator. The results demonstrate significant energy saving by improving the skill of compliance, with a 30% electrical consumption reduction measured on hardware.