In this work, we present a robotic solution to automate the task of wall construction. To that end, we present an end-to-end visual perception framework that can quickly detect and localize bricks in a clutter. Further, we present a light computational method of brick pose estimation that incorporates the above information. The proposed detection network predicts a rotated box compared to YOLO and SSD, thereby maximizing the object's region in the predicted box regions. In addition, precision P, recall R, and mean-average-precision (mAP) scores are reported to evaluate the proposed framework. We observed that for our task, the proposed scheme outperforms the upright bounding box detectors. Further, we deploy the proposed visual perception framework on a robotic system endowed with a UR5 robot manipulator and demonstrate that the system can successfully replicate a simplified version of the wall-building task in an autonomous mode.