Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shengjie Li

UAGLNet: Uncertainty-Aggregated Global-Local Fusion Network with Cooperative CNN-Transformer for Building Extraction

Dec 15, 2025

Siyuan Yao, Dongxiu Liu, Taotao Li, Shengjie Li, Wenqi Ren, Xiaochun Cao

Abstract:Building extraction from remote sensing images is a challenging task due to the complex structure variations of the buildings. Existing methods employ convolutional or self-attention blocks to capture the multi-scale features in the segmentation models, while the inherent gap of the feature pyramids and insufficient global-local feature integration leads to inaccurate, ambiguous extraction results. To address this issue, in this paper, we present an Uncertainty-Aggregated Global-Local Fusion Network (UAGLNet), which is capable to exploit high-quality global-local visual semantics under the guidance of uncertainty modeling. Specifically, we propose a novel cooperative encoder, which adopts hybrid CNN and transformer layers at different stages to capture the local and global visual semantics, respectively. An intermediate cooperative interaction block (CIB) is designed to narrow the gap between the local and global features when the network becomes deeper. Afterwards, we propose a Global-Local Fusion (GLF) module to complementarily fuse the global and local representations. Moreover, to mitigate the segmentation ambiguity in uncertain regions, we propose an Uncertainty-Aggregated Decoder (UAD) to explicitly estimate the pixel-wise uncertainty to enhance the segmentation accuracy. Extensive experiments demonstrate that our method achieves superior performance to other state-of-the-art methods. Our code is available at https://github.com/Dstate/UAGLNet

* IEEE TGRS

Via

Access Paper or Ask Questions

Multimodal Propaganda Processing

Feb 17, 2023

Vincent Ng, Shengjie Li

Figure 1 for Multimodal Propaganda Processing

Figure 2 for Multimodal Propaganda Processing

Figure 3 for Multimodal Propaganda Processing

Abstract:Propaganda campaigns have long been used to influence public opinion via disseminating biased and/or misleading information. Despite the increasing prevalence of propaganda content on the Internet, few attempts have been made by AI researchers to analyze such content. We introduce the task of multimodal propaganda processing, where the goal is to automatically analyze propaganda content. We believe that this task presents a long-term challenge to AI researchers and that successful processing of propaganda could bring machine understanding one important step closer to human understanding. We discuss the technical challenges associated with this task and outline the steps that need to be taken to address it.

* Accepted in AAAI 2023

Via

Access Paper or Ask Questions

End-to-End Neural Discourse Deixis Resolution in Dialogue

Dec 03, 2022

Shengjie Li, Vincent Ng

Figure 1 for End-to-End Neural Discourse Deixis Resolution in Dialogue

Figure 2 for End-to-End Neural Discourse Deixis Resolution in Dialogue

Figure 3 for End-to-End Neural Discourse Deixis Resolution in Dialogue

Figure 4 for End-to-End Neural Discourse Deixis Resolution in Dialogue

Abstract:We adapt Lee et al.'s (2018) span-based entity coreference model to the task of end-to-end discourse deixis resolution in dialogue, specifically by proposing extensions to their model that exploit task-specific characteristics. The resulting model, dd-utt, achieves state-of-the-art results on the four datasets in the CODI-CRAC 2021 shared task.

* Accepted as a long paper to EMNLP 2022

Via

Access Paper or Ask Questions

Segmenting Epipolar Line

Oct 11, 2020

Shengjie Li, Qi Cai, Yuanxin Wu

Abstract:Identifying feature correspondence between two images is a fundamental procedure in three-dimensional computer vision. Usually the feature search space is confined by the epipolar line. Using the cheirality constraint, this paper finds that the feature search space can be restrained to one of two or three segments of the epipolar line that are defined by the epipole and a so-called virtual infinity point.

* 5 pages, 6 figures

Via

Access Paper or Ask Questions

Clue: Cross-modal Coherence Modeling for Caption Generation

May 02, 2020

Malihe Alikhani, Piyush Sharma, Shengjie Li, Radu Soricut, Matthew Stone

Figure 1 for Clue: Cross-modal Coherence Modeling for Caption Generation

Figure 2 for Clue: Cross-modal Coherence Modeling for Caption Generation

Figure 3 for Clue: Cross-modal Coherence Modeling for Caption Generation

Figure 4 for Clue: Cross-modal Coherence Modeling for Caption Generation

Abstract:We use coherence relations inspired by computational models of discourse to study the information needs and goals of image captioning. Using an annotation protocol specifically devised for capturing image--caption coherence relations, we annotate 10,000 instances from publicly-available image--caption pairs. We introduce a new task for learning inferences in imagery and text, coherence relation prediction, and show that these coherence annotations can be exploited to learn relation classifiers as an intermediary step, and also train coherence-aware, controllable image captioning models. The results show a dramatic improvement in the consistency and quality of the generated captions with respect to information needs specified via coherence relations.

* Accepted as a long paper to ACL 2020

Via

Access Paper or Ask Questions