The detection of ancient settlements is a key focus in landscape archaeology. Traditionally, settlements were identified through pedestrian survey, as researchers physically traversed the landscape and recorded settlement locations. Recently the manual identification and labeling of ancient remains in satellite imagery have increased the scale of archaeological data collection, but the process remains tremendously time-consuming and arduous. The development of self-supervised learning (e.g., contrastive learning) offers a scalable learning scheme in locating archaeological sites using unlabeled satellite and historical aerial images. However, archaeology sites are only present in a very small proportion of the whole landscape, while the modern contrastive-supervised learning approach typically yield inferior performance on the highly balanced dataset, such as identifying sparsely localized ancient urbanization on a large area using satellite images. In this work, we propose a framework to solve this long-tail problem. As opposed to the existing contrastive learning approaches that typically treat the labeled and unlabeled data separately, the proposed method reforms the learning paradigm under a semi-supervised setting to fully utilize the precious annotated data (<7% in our setting). Specifically, the highly unbalanced nature of the data is employed as the prior knowledge to form pseudo negative pairs by ranking the similarities between unannotated image patches and annotated anchor images. In this study, we used 95,358 unlabeled images and 5,830 labeled images to solve the problem of detecting ancient buildings from a long-tailed satellite image dataset. From the results, our semi-supervised contrastive learning model achieved a promising testing balanced accuracy of 79.0%, which is 3.8% improvement over state-of-the-art approaches.