Abstract:Text attribute person search aims to find specific pedestrians through given textual attributes, which is very meaningful in the scene of searching for designated pedestrians through witness descriptions. The key challenge is the significant modality gap between textual attributes and images. Previous methods focused on achieving explicit representation and alignment through unimodal pre-trained models. Nevertheless, the absence of inter-modality correspondence in these models may lead to distortions in the local information of intra-modality. Moreover, these methods only considered the alignment of inter-modality and ignored the differences between different attribute categories. To mitigate the above problems, we propose an Attribute-Aware Implicit Modality Alignment (AIMA) framework to learn the correspondence of local representations between textual attributes and images and combine global representation matching to narrow the modality gap. Firstly, we introduce the CLIP model as the backbone and design prompt templates to transform attribute combinations into structured sentences. This facilitates the model's ability to better understand and match image details. Next, we design a Masked Attribute Prediction (MAP) module that predicts the masked attributes after the interaction of image and masked textual attribute features through multi-modal interaction, thereby achieving implicit local relationship alignment. Finally, we propose an Attribute-IoU Guided Intra-Modal Contrastive (A-IoU IMC) loss, aligning the distribution of different textual attributes in the embedding space with their IoU distribution, achieving better semantic arrangement. Extensive experiments on the Market-1501 Attribute, PETA, and PA100K datasets show that the performance of our proposed method significantly surpasses the current state-of-the-art methods.
Abstract:Although the U-Net architecture has been extensively used for segmentation of medical images, we address two of its shortcomings in this work. Firstly, the accuracy of vanilla U-Net degrades when the target regions for segmentation exhibit significant variations in shape and size. Even though the U-Net already possesses some capability to analyze features at various scales, we propose to explicitly add multi-scale feature maps in each convolutional module of the U-Net encoder to improve segmentation of histology images. Secondly, the accuracy of a U-Net model also suffers when the annotations for supervised learning are noisy or incomplete. This can happen due to the inherent difficulty for a human expert to identify and delineate all instances of specific pathology very precisely and accurately. We address this challenge by introducing auxiliary confidence maps that emphasize less on the boundaries of the given target regions. Further, we utilize the bootstrapping properties of the deep network to address the missing annotation problem intelligently. In our experiments on a private dataset of breast cancer lymph nodes, where the primary task was to segment germinal centres and sinus histiocytosis, we observed substantial improvement over a U-Net baseline based on the two proposed augmentations.
Abstract:Considering the performance of intelligent task during signal exchange can help the communication system to automatically select those semantic parts which are helpful to perform the target task for compression and reconstruction, which can both greatly reduce the redundancy in signal and ensure the performance of the task. The traditional communication system based on rate-distortion theory treats all the information in the signal equally, but ignores their different importance to accomplish the task, which leads to waste of communication resources. In this paper, combined with the information bottleneck method, we present an extended rate-distortion theory which considers both concise representation and semantic distortion. Based on this theory, a task-oriented semantic image communication system is proposed. In order to verify that the proposed system can achieve performance improvement on different intelligent tasks, we apply the basic system trained with classification task to the system with object detection as the target task. The experimental results demonstrate that the proposed method outperforms the traditional and multi-task based communication system in terms of task performance at the same signal compression degree and noise interference degree. Furthermore, it is necessary to consider a compromise between rate-distortion theory and information bottleneck theory by comparing the pure rate-distortion scheme and the pure IB scheme.
Abstract:A radical paradigm shift of wireless networks from ``connected things'' to ``connected intelligence'' undergoes, which coincides with the Shanno and Weaver's envisions: Communications will transform from the technical level to the semantic level. This article proposes a semantic communication method with artificial intelligence tasks (SC-AIT). First, the architecture of SC-AIT is elaborated. Then, based on the proposed architecture, we implement SC-AIT for a image classifications task. A prototype of SC-AIT is also established for surface defect detection, is conducted. Experimental results show that SC-AIT has much lower bandwidth requirements, and can achieve more than $40\%$ classification accuracy gains compared with the communications at the technical level. Future trends and key challenges for semantic communications are also identified.
Abstract:Approximation fixpoint theory (AFT) provides an algebraic framework for the study of fixpoints of operators on bilattices and has found its applications in characterizing semantics for various classes of logic programs and nonmonotonic languages. In this paper, we show one more application of this kind: the alternating fixpoint operator by Knorr et al. for the study of the well-founded semantics for hybrid MKNF knowledge bases is in fact an approximator of AFT in disguise, which, thanks to the power of abstraction of AFT, characterizes not only the well-founded semantics but also two-valued as well as three-valued semantics for hybrid MKNF knowledge bases. Furthermore, we show an improved approximator for these knowledge bases, of which the least stable fixpoint is information richer than the one formulated from Knorr et al.'s construction. This leads to an improved computation for the well-founded semantics. This work is built on an extension of AFT that supports consistent as well as inconsistent pairs in the induced product bilattice, to deal with inconsistencies that arise in the context of hybrid MKNF knowledge bases. This part of the work can be considered generalizing the original AFT from symmetric approximators to arbitrary approximators.
Abstract:Hybrid MKNF knowledge bases have been considered one of the dominant approaches to combining open world ontology languages with closed world rule-based languages. Currently, the only known inference methods are based on the approach of guess-and-verify, while most modern SAT/ASP solvers are built under the DPLL architecture. The central impediment here is that it is not clear what constitutes a constraint propagator, a key component employed in any DPLL-based solver. In this paper, we address this problem by formulating the notion of unfounded sets for nondisjunctive hybrid MKNF knowledge bases, based on which we propose and study two new well-founded operators. We show that by employing a well-founded operator as a constraint propagator, a sound and complete DPLL search engine can be readily defined. We compare our approach with the operator based on the alternating fixpoint construction by Knorr et al [2011] and show that, when applied to arbitrary partial partitions, the new well-founded operators not only propagate more truth values but also circumvent the non-converging behavior of the latter. In addition, we study the possibility of simplifying a given hybrid MKNF knowledge base by employing a well-founded operator, and show that, out of the two operators proposed in this paper, the weaker one can be applied for this purpose and the stronger one cannot. These observations are useful in implementing a grounder for hybrid MKNF knowledge bases, which can be applied before the computation of MKNF models. The paper is under consideration for acceptance in TPLP.