Automatic building extraction from aerial and satellite imagery is highly challenging due to extremely large variations of building appearances. To attack this problem, we design a convolutional network with a final stage that integrates activations from multiple preceding stages for pixel-wise prediction, and introduce the signed distance function of building boundaries as the output representation, which has an enhanced representation power. We leverage abundant building footprint data available from geographic information systems (GIS) to compile training data. The trained network achieves superior performance on datasets that are significantly larger and more complex than those used in prior work, demonstrating that the proposed method provides a promising and scalable solution for automating this labor-intensive task.