A robotic solution for the unmanned ground vehicles (UGVs) to execute the highly complex task of object manipulation in an autonomous mode is presented. This paper primarily focuses on developing an autonomous robotic system capable of assembling elementary blocks to build the large 3D structures in GPS-denied environments. The key contributions of this system paper are i) Designing of a deep learning-based unified multi-task visual perception system for object detection, part-detection, instance segmentation, and tracking, ii) an electromagnetic gripper design for robust grasping, and iii) system integration in which multiple system components are integrated to develop an optimized software stack. The entire mechatronic and algorithmic design of UGV for the above application is detailed in this work. The performance and efficacy of the overall system are reported through several rigorous experiments.