A conventional approach to computed tomography (CT) or cone beam CT (CBCT) metal artifact reduction is to replace the X-ray projection data within the metal trace with synthesized data. However, existing projection or sinogram completion methods cannot always produce anatomically consistent information to fill the metal trace, and thus, when the metallic implant is large, significant secondary artifacts are often introduced. In this work, we propose to replace metal artifact affected regions with anatomically consistent content through joint projection-sinogram correction as well as adversarial learning. To handle the metallic implants of diverse shapes and large sizes, we also propose a novel mask pyramid network that enforces the mask information across the network's encoding layers and a mask fusion loss that reduces early saturation of adversarial training. Our experimental results show that the proposed projection-sinogram correction designs are effective and our method recovers information from the metal traces better than the state-of-the-art methods.