Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qingyu Li

SPF-Portrait: Towards Pure Portrait Customization with Semantic Pollution-Free Fine-tuning

Apr 01, 2025

Xiaole Xian, Zhichao Liao, Qingyu Li, Wenyu Qin, Pengfei Wan, Weicheng Xie, Long Zeng, Linlin Shen, Pingfa Feng

Abstract:While fine-tuning pre-trained Text-to-Image (T2I) models on portrait datasets enables attribute customization, existing methods suffer from Semantic Pollution that compromises the original model's behavior and prevents incremental learning. To address this, we propose SPF-Portrait, a pioneering work to purely understand customized semantics while eliminating semantic pollution in text-driven portrait customization. In our SPF-Portrait, we propose a dual-path pipeline that introduces the original model as a reference for the conventional fine-tuning path. Through contrastive learning, we ensure adaptation to target attributes and purposefully align other unrelated attributes with the original portrait. We introduce a novel Semantic-Aware Fine Control Map, which represents the precise response regions of the target semantics, to spatially guide the alignment process between the contrastive paths. This alignment process not only effectively preserves the performance of the original model but also avoids over-alignment. Furthermore, we propose a novel response enhancement mechanism to reinforce the performance of target attributes, while mitigating representation discrepancy inherent in direct cross-modal supervision. Extensive experiments demonstrate that SPF-Portrait achieves state-of-the-art performance.

Via

Access Paper or Ask Questions

HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment

Mar 31, 2025

Zhichao Liao, Xiaokun Liu, Wenyu Qin, Qingyu Li, Qiulin Wang, Pengfei Wan, Di Zhang, Long Zeng, Pingfa Feng

Abstract:Image Aesthetic Assessment (IAA) is a long-standing and challenging research task. However, its subset, Human Image Aesthetic Assessment (HIAA), has been scarcely explored, even though HIAA is widely used in social media, AI workflows, and related domains. To bridge this research gap, our work pioneers a holistic implementation framework tailored for HIAA. Specifically, we introduce HumanBeauty, the first dataset purpose-built for HIAA, which comprises 108k high-quality human images with manual annotations. To achieve comprehensive and fine-grained HIAA, 50K human images are manually collected through a rigorous curation process and annotated leveraging our trailblazing 12-dimensional aesthetic standard, while the remaining 58K with overall aesthetic labels are systematically filtered from public datasets. Based on the HumanBeauty database, we propose HumanAesExpert, a powerful Vision Language Model for aesthetic evaluation of human images. We innovatively design an Expert head to incorporate human knowledge of aesthetic sub-dimensions while jointly utilizing the Language Modeling (LM) and Regression head. This approach empowers our model to achieve superior proficiency in both overall and fine-grained HIAA. Furthermore, we introduce a MetaVoter, which aggregates scores from all three heads, to effectively balance the capabilities of each head, thereby realizing improved assessment precision. Extensive experiments demonstrate that our HumanAesExpert models deliver significantly better performance in HIAA than other state-of-the-art models. Our datasets, models, and codes are publicly released to advance the HIAA community. Project webpage: https://humanaesexpert.github.io/HumanAesExpert/

Via

Access Paper or Ask Questions

Global OpenBuildingMap -- Unveiling the Mystery of Global Buildings

Apr 22, 2024

Xiao Xiang Zhu, Qingyu Li, Yilei Shi, Yuanyuan Wang, Adam Stewart, Jonathan Prexl

Abstract:Understanding how buildings are distributed globally is crucial to revealing the human footprint on our home planet. This built environment affects local climate, land surface albedo, resource distribution, and many other key factors that influence well-being and human health. Despite this, quantitative and comprehensive data on the distribution and properties of buildings worldwide is lacking. To this end, by using a big data analytics approach and nearly 800,000 satellite images, we generated the highest resolution and highest accuracy building map ever created: the Global OpenBuildingMap (Global OBM). A joint analysis of building maps and solar potentials indicates that rooftop solar energy can supply the global energy consumption need at a reasonable cost. Specifically, if solar panels were placed on the roofs of all buildings, they could supply 1.1-3.3 times -- depending on the efficiency of the solar device -- the global energy consumption in 2020, which is the year with the highest consumption on record. We also identified a clear geospatial correlation between building areas and key socioeconomic variables, which indicates our global building map can serve as an important input to modeling global socioeconomic needs and drivers.

Via

Access Paper or Ask Questions

AIO2: Online Correction of Object Labels for Deep Learning with Incomplete Annotation in Remote Sensing Image Segmentation

Mar 03, 2024

Chenying Liu, Conrad M Albrecht, Yi Wang, Qingyu Li, Xiao Xiang Zhu

Figure 1 for AIO2: Online Correction of Object Labels for Deep Learning with Incomplete Annotation in Remote Sensing Image Segmentation

Figure 2 for AIO2: Online Correction of Object Labels for Deep Learning with Incomplete Annotation in Remote Sensing Image Segmentation

Figure 3 for AIO2: Online Correction of Object Labels for Deep Learning with Incomplete Annotation in Remote Sensing Image Segmentation

Figure 4 for AIO2: Online Correction of Object Labels for Deep Learning with Incomplete Annotation in Remote Sensing Image Segmentation

Abstract:While the volume of remote sensing data is increasing daily, deep learning in Earth Observation faces lack of accurate annotations for supervised optimization. Crowdsourcing projects such as OpenStreetMap distribute the annotation load to their community. However, such annotation inevitably generates noise due to insufficient control of the label quality, lack of annotators, frequent changes of the Earth's surface as a result of natural disasters and urban development, among many other factors. We present Adaptively trIggered Online Object-wise correction (AIO2) to address annotation noise induced by incomplete label sets. AIO2 features an Adaptive Correction Trigger (ACT) module that avoids label correction when the model training under- or overfits, and an Online Object-wise Correction (O2C) methodology that employs spatial information for automated label modification. AIO2 utilizes a mean teacher model to enhance training robustness with noisy labels to both stabilize the training accuracy curve for fitting in ACT and provide pseudo labels for correction in O2C. Moreover, O2C is implemented online without the need to store updated labels every training epoch. We validate our approach on two building footprint segmentation datasets with different spatial resolutions. Experimental results with varying degrees of building label noise demonstrate the robustness of AIO2. Source code will be available at https://github.com/zhu-xlab/AIO2.git.

* This work has been accepted by IEEE Transactions on Geoscience and Remote Sensing (TGRS)

Via

Access Paper or Ask Questions

Semi-Supervised Building Footprint Generation with Feature and Output Consistency Training

May 17, 2022

Qingyu Li, Yilei Shi, Xiao Xiang Zhu

Figure 1 for Semi-Supervised Building Footprint Generation with Feature and Output Consistency Training

Figure 2 for Semi-Supervised Building Footprint Generation with Feature and Output Consistency Training

Figure 3 for Semi-Supervised Building Footprint Generation with Feature and Output Consistency Training

Figure 4 for Semi-Supervised Building Footprint Generation with Feature and Output Consistency Training

Abstract:Accurate and reliable building footprint maps are vital to urban planning and monitoring, and most existing approaches fall back on convolutional neural networks (CNNs) for building footprint generation. However, one limitation of these methods is that they require strong supervisory information from massive annotated samples for network learning. State-of-the-art semi-supervised semantic segmentation networks with consistency training can help to deal with this issue by leveraging a large amount of unlabeled data, which encourages the consistency of model output on data perturbation. Considering that rich information is also encoded in feature maps, we propose to integrate the consistency of both features and outputs in the end-to-end network training of unlabeled samples, enabling to impose additional constraints. Prior semi-supervised semantic segmentation networks have established the cluster assumption, in which the decision boundary should lie in the vicinity of low sample density. In this work, we observe that for building footprint generation, the low-density regions are more apparent at the intermediate feature representations within the encoder than the encoder's input or output. Therefore, we propose an instruction to assign the perturbation to the intermediate feature representations within the encoder, which considers the spatial resolution of input remote sensing imagery and the mean size of individual buildings in the study area. The proposed method is evaluated on three datasets with different resolutions: Planet dataset (3 m/pixel), Massachusetts dataset (1 m/pixel), and Inria dataset (0.3 m/pixel). Experimental results show that the proposed approach can well extract more complete building structures and alleviate omission errors.

Via

Access Paper or Ask Questions

Instance segmentation of buildings using keypoints

Jun 06, 2020

Qingyu Li, Lichao Mou, Yuansheng Hua, Yao Sun, Pu Jin, Yilei Shi, Xiao Xiang Zhu

Figure 1 for Instance segmentation of buildings using keypoints

Figure 2 for Instance segmentation of buildings using keypoints

Figure 3 for Instance segmentation of buildings using keypoints

Figure 4 for Instance segmentation of buildings using keypoints

Abstract:Building segmentation is of great importance in the task of remote sensing imagery interpretation. However, the existing semantic segmentation and instance segmentation methods often lead to segmentation masks with blurred boundaries. In this paper, we propose a novel instance segmentation network for building segmentation in high-resolution remote sensing images. More specifically, we consider segmenting an individual building as detecting several keypoints. The detected keypoints are subsequently reformulated as a closed polygon, which is the semantic boundary of the building. By doing so, the sharp boundary of the building could be preserved. Experiments are conducted on selected Aerial Imagery for Roof Segmentation (AIRS) dataset, and our method achieves better performance in both quantitative and qualitative results with comparison to the state-of-the-art methods. Our network is a bottom-up instance segmentation method that could well preserve geometric details.

Via

Access Paper or Ask Questions

Building Footprint Generation by IntegratingConvolution Neural Network with Feature PairwiseConditional Random Field (FPCRF)

Feb 11, 2020

Qingyu Li, Yilei Shi, Xin Huang, Xiao Xiang Zhu

Figure 1 for Building Footprint Generation by IntegratingConvolution Neural Network with Feature PairwiseConditional Random Field (FPCRF)

Figure 2 for Building Footprint Generation by IntegratingConvolution Neural Network with Feature PairwiseConditional Random Field (FPCRF)

Figure 3 for Building Footprint Generation by IntegratingConvolution Neural Network with Feature PairwiseConditional Random Field (FPCRF)

Figure 4 for Building Footprint Generation by IntegratingConvolution Neural Network with Feature PairwiseConditional Random Field (FPCRF)

Abstract:Building footprint maps are vital to many remote sensing applications, such as 3D building modeling, urban planning, and disaster management. Due to the complexity of buildings, the accurate and reliable generation of the building footprint from remote sensing imagery is still a challenging task. In this work, an end-to-end building footprint generation approach that integrates convolution neural network (CNN) and graph model is proposed. CNN serves as the feature extractor, while the graph model can take spatial correlation into consideration. Moreover, we propose to implement the feature pairwise conditional random field (FPCRF) as a graph model to preserve sharp boundaries and fine-grained segmentation. Experiments are conducted on four different datasets: (1) Planetscope satellite imagery of the cities of Munich, Paris, Rome, and Zurich; (2) ISPRS benchmark data from the city of Potsdam, (3) Dstl Kaggle dataset; and (4) Inria Aerial Image Labeling data of Austin, Chicago, Kitsap County, Western Tyrol, and Vienna. It is found that the proposed end-to-end building footprint generation framework with the FPCRF as the graph model can further improve the accuracy of building footprint generation by using only CNN, which is the current state-of-the-art.

Via

Access Paper or Ask Questions

Building Segmentation through a Gated Graph Convolutional Neural Network with Deep Structured Feature Embedding

Nov 08, 2019

Yilei Shi, Qingyu Li, Xiao Xiang Zhu

Figure 1 for Building Segmentation through a Gated Graph Convolutional Neural Network with Deep Structured Feature Embedding

Figure 2 for Building Segmentation through a Gated Graph Convolutional Neural Network with Deep Structured Feature Embedding

Figure 3 for Building Segmentation through a Gated Graph Convolutional Neural Network with Deep Structured Feature Embedding

Figure 4 for Building Segmentation through a Gated Graph Convolutional Neural Network with Deep Structured Feature Embedding

Abstract:Automatic building extraction from optical imagery remains a challenge due to, for example, the complexity of building shapes. Semantic segmentation is an efficient approach for this task. The latest development in deep convolutional neural networks (DCNNs) has made accurate pixel-level classification tasks possible. Yet one central issue remains: the precise delineation of boundaries. Deep architectures generally fail to produce fine-grained segmentation with accurate boundaries due to their progressive down-sampling. Hence, we introduce a generic framework to overcome the issue, integrating the graph convolutional network (GCN) and deep structured feature embedding (DSFE) into an end-to-end workflow. Furthermore, instead of using a classic graph convolutional neural network, we propose a gated graph convolutional network, which enables the refinement of weak and coarse semantic predictions to generate sharp borders and fine-grained pixel-level classification. Taking the semantic segmentation of building footprints as a practical example, we compared different feature embedding architectures and graph neural networks. Our proposed framework with the new GCN architecture outperforms state-of-the-art approaches. Although our main task in this work is building footprint extraction, the proposed method can be generally applied to other binary or multi-label segmentation tasks.

Via

Access Paper or Ask Questions

Building Footprint Generation Using Improved Generative Adversarial Networks

Oct 26, 2018

Yilei Shi, Qingyu Li, Xiao Xiang Zhu

Figure 1 for Building Footprint Generation Using Improved Generative Adversarial Networks

Figure 2 for Building Footprint Generation Using Improved Generative Adversarial Networks

Figure 3 for Building Footprint Generation Using Improved Generative Adversarial Networks

Figure 4 for Building Footprint Generation Using Improved Generative Adversarial Networks

Abstract:Building footprint information is an essential ingredient for 3-D reconstruction of urban models. The automatic generation of building footprints from satellite images presents a considerable challenge due to the complexity of building shapes. In this work, we have proposed improved generative adversarial networks (GANs) for the automatic generation of building footprints from satellite images. We used a conditional GAN with a cost function derived from the Wasserstein distance and added a gradient penalty term. The achieved results indicated that the proposed method can significantly improve the quality of building footprint generation compared to conditional generative adversarial networks, the U-Net, and other networks. In addition, our method nearly removes all hyperparameters tuning.

* 5 pages

Via

Access Paper or Ask Questions