Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yufeng Huang

Structure-CLIP: Enhance Multi-modal Language Representations with Structure Knowledge

May 06, 2023

Yufeng Huang, Jiji Tang, Zhuo Chen, Rongsheng Zhang, Xinfeng Zhang, Weijie Chen, Zeng Zhao, Tangjie Lv, Zhipeng Hu, Wen Zhang

Abstract:Large-scale vision-language pre-training has shown promising advances on various downstream tasks and achieved significant performance in multi-modal understanding and generation tasks. However, existing methods often perform poorly on image-text matching tasks that require a detailed semantics understanding of the text. Although there have been some works on this problem, they do not sufficiently exploit the structural knowledge present in sentences to enhance multi-modal language representations, which leads to poor performance. In this paper, we present an end-to-end framework Structure-CLIP, which integrates latent detailed semantics from the text to enhance fine-grained semantic representations. Specifically, (1) we use scene graphs in order to pay more attention to the detailed semantic learning in the text and fully explore structured knowledge between fine-grained semantics, and (2) we utilize the knowledge-enhanced framework with the help of the scene graph to make full use of representations of structured knowledge. To verify the effectiveness of our proposed method, we pre-trained our models with the aforementioned approach and conduct experiments on different downstream tasks. Numerical results show that Structure-CLIP can often achieve state-of-the-art performance on both VG-Attribution and VG-Relation datasets. Extensive experiments show its components are effective and its predictions are interpretable, which proves that our proposed method can enhance detailed semantic representation well.

* Work in progress

Via

Access Paper or Ask Questions

Structure Pretraining and Prompt Tuning for Knowledge Graph Transfer

Mar 03, 2023

Wen Zhang, Yushan Zhu, Mingyang Chen, Yuxia Geng, Yufeng Huang, Yajing Xu, Wenting Song, Huajun Chen

Abstract:Knowledge graphs (KG) are essential background knowledge providers in many tasks. When designing models for KG-related tasks, one of the key tasks is to devise the Knowledge Representation and Fusion (KRF) module that learns the representation of elements from KGs and fuses them with task representations. While due to the difference of KGs and perspectives to be considered during fusion across tasks, duplicate and ad hoc KRF modules design are conducted among tasks. In this paper, we propose a novel knowledge graph pretraining model KGTransformer that could serve as a uniform KRF module in diverse KG-related tasks. We pretrain KGTransformer with three self-supervised tasks with sampled sub-graphs as input. For utilization, we propose a general prompt-tuning mechanism regarding task data as a triple prompt to allow flexible interactions between task KGs and task data. We evaluate pretrained KGTransformer on three tasks, triple classification, zero-shot image classification, and question answering. KGTransformer consistently achieves better results than specifically designed task models. Through experiments, we justify that the pretrained KGTransformer could be used off the shelf as a general and effective KRF module across KG-related tasks. The code and datasets are available at https://github.com/zjukg/KGTransformer.

* Work accepted by WWW2023

Via

Access Paper or Ask Questions

MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid

Dec 29, 2022

Zhuo Chen, Jiaoyan Chen, Wen Zhang, Lingbing Guo, Yin Fang, Yufeng Huang, Yuxia Geng, Jeff Z. Pan, Wenting Song, Huajun Chen

Figure 1 for MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid

Figure 2 for MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid

Figure 3 for MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid

Figure 4 for MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid

Abstract:As an important variant of entity alignment (EA), multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs (KGs) with multiple modalities like images. However, current MMEA algorithms all adopt KG-level modality fusion strategies but ignore modality differences among individual entities, hurting the robustness to potential noise involved in modalities (e.g., unidentifiable images and relations). In this paper we present MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid, to dynamically predict the mutual correlation coefficients among modalities for instance-level feature fusion. A modal-aware hard entity replay strategy is also proposed for addressing vague entity details. Extensive experimental results show that our model not only achieves SOTA performance on multiple training scenarios including supervised, unsupervised, iterative, and low resource, but also has limited parameters, optimistic speed, and good interpretability. Our code will be available soon.

* 12 pages, 8 figures, 8 tables

Via

Access Paper or Ask Questions

Tele-Knowledge Pre-training for Fault Analysis

Oct 20, 2022

Zhuo Chen, Wen Zhang, Yufeng Huang, Mingyang Chen, Yuxia Geng, Hongtao Yu, Zhen Bi, Yichi Zhang, Zhen Yao, Wenting Song(+6 more)

Figure 1 for Tele-Knowledge Pre-training for Fault Analysis

Figure 2 for Tele-Knowledge Pre-training for Fault Analysis

Figure 3 for Tele-Knowledge Pre-training for Fault Analysis

Figure 4 for Tele-Knowledge Pre-training for Fault Analysis

Abstract:In this work, we share our experience on tele-knowledge pre-training for fault analysis. Fault analysis is a vital task for tele-application, which should be timely and properly handled. Fault analysis is also a complex task, that has many sub-tasks. Solving each task requires diverse tele-knowledge. Machine log data and product documents contain part of the tele-knowledge. We create a Tele-KG to organize other tele-knowledge from experts uniformly. With these valuable tele-knowledge data, in this work, we propose a tele-domain pre-training model KTeleBERT and its knowledge-enhanced version KTeleBERT, which includes effective prompt hints, adaptive numerical data encoding, and two knowledge injection paradigms. We train our model in two stages: pre-training TeleBERT on 20 million telecommunication corpora and re-training TeleBERT on 1 million causal and machine corpora to get the KTeleBERT. Then, we apply our models for three tasks of fault analysis, including root-cause analysis, event association prediction, and fault chain tracing. The results show that with KTeleBERT, the performance of task models has been boosted, demonstrating the effectiveness of pre-trained KTeleBERT as a model containing diverse tele-knowledge.

* 12 pages, 10 figures

Via

Access Paper or Ask Questions

Aspect-based Sentiment Classification with Sequential Cross-modal Semantic Graph

Aug 19, 2022

Yufeng Huang, Zhuo Chen, Wen Zhang, Jiaoyan Chen, Jeff Z. Pan, Zhen Yao, Yujie Xie, Huajun Chen

Figure 1 for Aspect-based Sentiment Classification with Sequential Cross-modal Semantic Graph

Figure 2 for Aspect-based Sentiment Classification with Sequential Cross-modal Semantic Graph

Figure 3 for Aspect-based Sentiment Classification with Sequential Cross-modal Semantic Graph

Figure 4 for Aspect-based Sentiment Classification with Sequential Cross-modal Semantic Graph

Abstract:Multi-modal aspect-based sentiment classification (MABSC) is an emerging classification task that aims to classify the sentiment of a given target such as a mentioned entity in data with different modalities. In typical multi-modal data with text and image, previous approaches do not make full use of the fine-grained semantics of the image, especially in conjunction with the semantics of the text and do not fully consider modeling the relationship between fine-grained image information and target, which leads to insufficient use of image and inadequate to identify fine-grained aspects and opinions. To tackle these limitations, we propose a new framework SeqCSG including a method to construct sequential cross-modal semantic graphs and an encoder-decoder model. Specifically, we extract fine-grained information from the original image, image caption, and scene graph, and regard them as elements of the cross-modal semantic graph as well as tokens from texts. The cross-modal semantic graph is represented as a sequence with a multi-modal visible matrix indicating relationships between elements. In order to effectively utilize the cross-modal semantic graph, we propose an encoder-decoder method with a target prompt template. Experimental results show that our approach outperforms existing methods and achieves the state-of-the-art on two standard datasets MABSC. Further analysis demonstrates the effectiveness of each component and our model can implicitly learn the correlation between the target and fine-grained information of the image.

* Work in progress

Via

Access Paper or Ask Questions

LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

Jul 26, 2022

Zhuo Chen, Yufeng Huang, Jiaoyan Chen, Yuxia Geng, Yin Fang, Jeff Pan, Ningyu Zhang, Wen Zhang

Figure 1 for LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

Figure 2 for LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

Figure 3 for LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

Figure 4 for LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

Abstract:Visual question answering (VQA) often requires an understanding of visual concepts and language semantics, which relies on external knowledge. Most existing methods exploit pre-trained language models or/and unstructured text, but the knowledge in these resources are often incomplete and noisy. Some methods prefer to use knowledge graphs (KGs) which often have intensive structured knowledge, but the research is still quite preliminary. In this paper, we propose LaKo, a knowledge-driven VQA method via Late Knowledge-to-text Injection. To effectively incorporate an external KG, we transfer triples into text and propose a late injection mechanism. Finally we address VQA as a text generation task with an effective encoder-decoder paradigm. In the evaluation with OKVQA datasets, our method achieves state-of-the-art results.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Jul 04, 2022

Zhuo Chen, Yufeng Huang, Jiaoyan Chen, Yuxia Geng, Wen Zhang, Yin Fang, Jeff Z. Pan, Wenting Song, Huajun Chen

Figure 1 for DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Figure 2 for DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Figure 3 for DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Figure 4 for DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Abstract:Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training, often utilizing additional semantic information (a.k.a. side information) to bridge the training (seen) classes and the unseen classes. One of the most effective and widely used semantic information for zero-shot image classification are attributes which are annotations for class-level visual characteristics. However, due to the shortage of fine-grained annotations, the attribute imbalance and co-occurrence, the current methods often fail to discriminate those subtle visual distinctions between images, which limits their performances. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pretrained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images, (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance, and (3) proposed a multi-task learning policy for considering multi-model objectives. With extensive experiments on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark, we find that DUET can often achieve state-of-the-art performance, its components are effective and its predictions are interpretable.

* Work in progress

Via

Access Paper or Ask Questions

Disentangled Ontology Embedding for Zero-shot Learning

Jun 08, 2022

Yuxia Geng, Jiaoyan Chen, Wen Zhang, Yajing Xu, Zhuo Chen, Jeff Z. Pan, Yufeng Huang, Feiyu Xiong, Huajun Chen

Figure 1 for Disentangled Ontology Embedding for Zero-shot Learning

Figure 2 for Disentangled Ontology Embedding for Zero-shot Learning

Figure 3 for Disentangled Ontology Embedding for Zero-shot Learning

Figure 4 for Disentangled Ontology Embedding for Zero-shot Learning

Abstract:Knowledge Graph (KG) and its variant of ontology have been widely used for knowledge representation, and have shown to be quite effective in augmenting Zero-shot Learning (ZSL). However, existing ZSL methods that utilize KGs all neglect the intrinsic complexity of inter-class relationships represented in KGs. One typical feature is that a class is often related to other classes in different semantic aspects. In this paper, we focus on ontologies for augmenting ZSL, and propose to learn disentangled ontology embeddings guided by ontology properties to capture and utilize more fine-grained class relationships in different aspects. We also contribute a new ZSL framework named DOZSL, which contains two new ZSL solutions based on generative models and graph propagation models, respectively, for effectively utilizing the disentangled ontology embeddings. Extensive evaluations have been conducted on five benchmarks across zero-shot image classification (ZS-IMGC) and zero-shot KG completion (ZS-KGC). DOZSL often achieves better performance than the state-of-the-art, and its components have been verified by ablation studies and case studies. Our codes and datasets are available at https://github.com/zjukg/DOZSL.

* Accepted by KDD'22

Via

Access Paper or Ask Questions

Unpaired Deep Image Dehazing Using Contrastive Disentanglement Learning

Mar 15, 2022

Xiang Chen, Zhentao Fan, Zhuoran Zheng, Yufeng Li, Yufeng Huang, Longgang Dai, Caihua Kong, Pengpeng Li

Figure 1 for Unpaired Deep Image Dehazing Using Contrastive Disentanglement Learning

Figure 2 for Unpaired Deep Image Dehazing Using Contrastive Disentanglement Learning

Figure 3 for Unpaired Deep Image Dehazing Using Contrastive Disentanglement Learning

Figure 4 for Unpaired Deep Image Dehazing Using Contrastive Disentanglement Learning

Abstract:We present an effective unpaired learning based image dehazing network from an unpaired set of clear and hazy images. This paper provides a new perspective to treat image dehazing as a two-class separated factor disentanglement task, i.e, the task-relevant factor of clear image reconstruction and the task-irrelevant factor of haze-relevant distribution. To achieve the disentanglement of these two-class factors in deep feature space, contrastive learning is introduced into a CycleGAN framework to learn disentangled representations by guiding the generated images to be associated with latent factors. With such formulation, the proposed contrastive disentangled dehazing method (CDD-GAN) first develops negative generators to cooperate with the encoder network to update alternately, so as to produce a queue of challenging negative adversaries. Then these negative adversaries are trained end-to-end together with the backbone representation network to enhance the discriminative information and promote factor disentanglement performance by maximizing the adversarial contrastive loss. During the training, we further show that hard negative examples can suppress the task-irrelevant factors and unpaired clear exemples can enhance the task-relevant factors, in order to better facilitate haze removal and help image restoration. Extensive experiments on both synthetic and real-world datasets demonstrate that our method performs favorably against existing state-of-the-art unpaired dehazing approaches.

Via

Access Paper or Ask Questions

NeuralKG: An Open Source Library for Diverse Representation Learning of Knowledge Graphs

Feb 25, 2022

Wen Zhang, Xiangnan Chen, Zhen Yao, Mingyang Chen, Yushan Zhu, Hongtao Yu, Yufeng Huang, Zezhong Xu, Yajing Xu, Ningyu Zhang(+3 more)

Figure 1 for NeuralKG: An Open Source Library for Diverse Representation Learning of Knowledge Graphs

Figure 2 for NeuralKG: An Open Source Library for Diverse Representation Learning of Knowledge Graphs

Figure 3 for NeuralKG: An Open Source Library for Diverse Representation Learning of Knowledge Graphs

Figure 4 for NeuralKG: An Open Source Library for Diverse Representation Learning of Knowledge Graphs

Abstract:NeuralKG is an open-source Python-based library for diverse representation learning of knowledge graphs. It implements three different series of Knowledge Graph Embedding (KGE) methods, including conventional KGEs, GNN-based KGEs, and Rule-based KGEs. With a unified framework, NeuralKG successfully reproduces link prediction results of these methods on benchmarks, freeing users from the laborious task of reimplementing them, especially for some methods originally written in non-python programming languages. Besides, NeuralKG is highly configurable and extensible. It provides various decoupled modules that can be mixed and adapted to each other. Thus with NeuralKG, developers and researchers can quickly implement their own designed models and obtain the optimal training methods to achieve the best performance efficiently. We built an website in http://neuralkg.zjukg.cn to organize an open and shared KG representation learning community. The source code is all publicly released at https://github.com/zjukg/NeuralKG.

* work in progress

Via

Access Paper or Ask Questions