An important yet challenging problem in understanding indoor scene is recovering indoor frame structure from a monocular image. It is more difficult when occlusions and illumination vary, and object boundaries are weak. To overcome these difficulties, a new approach based on line segment refinement with two constraints is proposed. First, the line segments are refined by four consecutive operations, i.e., reclassifying, connecting, fitting, and voting. Specifically, misclassified line segments are revised by the reclassifying operation, some short line segments are joined by the connecting operation, the undetected key line segments are recovered by the fitting operation with the help of the vanishing points, the line segments converging on the frame are selected by the voting operation. Second, we construct four frame models according to four classes of possible shooting angles of the monocular image, the natures of all frame models are introduced via enforcing the cross ratio and depth constraints. The indoor frame is then constructed by fitting those refined line segments with related frame model under the two constraints, which jointly advance the accuracy of the frame. Experimental results on a collection of over 300 indoor images indicate that our algorithm has the capability of recovering the frame from complex indoor scenes.