Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wilfred Ng

An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks

Oct 24, 2022

Changlong Yu, Tianyi Xiao, Lingpeng Kong, Yangqiu Song, Wilfred Ng

Figure 1 for An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks

Figure 2 for An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks

Figure 3 for An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks

Figure 4 for An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks

Abstract:Though linguistic knowledge emerges during large-scale language model pretraining, recent work attempt to explicitly incorporate human-defined linguistic priors into task-specific fine-tuning. Infusing language models with syntactic or semantic knowledge from parsers has shown improvements on many language understanding tasks. To further investigate the effectiveness of structural linguistic priors, we conduct empirical study of replacing parsed graphs or trees with trivial ones (rarely carrying linguistic knowledge e.g., balanced tree) for tasks in the GLUE benchmark. Encoding with trivial graphs achieves competitive or even better performance in fully-supervised and few-shot settings. It reveals that the gains might not be significantly attributed to explicit linguistic priors but rather to more feature interactions brought by fusion layers. Hence we call for attention to using trivial graphs as necessary baselines to design advanced knowledge fusion methods in the future.

* EMNLP 2022 Main Conference

Via

Access Paper or Ask Questions

Improving Event Representation via Simultaneous Weakly Supervised Contrastive Learning and Clustering

Mar 15, 2022

Jun Gao, Wei Wang, Changlong Yu, Huan Zhao, Wilfred Ng, Ruifeng Xu

Figure 1 for Improving Event Representation via Simultaneous Weakly Supervised Contrastive Learning and Clustering

Figure 2 for Improving Event Representation via Simultaneous Weakly Supervised Contrastive Learning and Clustering

Figure 3 for Improving Event Representation via Simultaneous Weakly Supervised Contrastive Learning and Clustering

Figure 4 for Improving Event Representation via Simultaneous Weakly Supervised Contrastive Learning and Clustering

Abstract:Representations of events described in text are important for various tasks. In this work, we present SWCC: a Simultaneous Weakly supervised Contrastive learning and Clustering framework for event representation learning. SWCC learns event representations by making better use of co-occurrence information of events. Specifically, we introduce a weakly supervised contrastive learning method that allows us to consider multiple positives and multiple negatives, and a prototype-based clustering method that avoids semantically related events being pulled apart. For model training, SWCC learns representations by simultaneously performing weakly supervised contrastive learning and prototype-based clustering. Experimental results show that SWCC outperforms other baselines on Hard Similarity and Transitive Sentence Similarity tasks. In addition, a thorough analysis of the prototype-based clustering method demonstrates that the learned prototype vectors are able to implicitly capture various relations between events.

* ACL 2022

Via

Access Paper or Ask Questions

CoCoLM: COmplex COmmonsense Enhanced Language Model

Dec 31, 2020

Changlong Yu, Hongming Zhang, Yangqiu Song, Wilfred Ng

Figure 1 for CoCoLM: COmplex COmmonsense Enhanced Language Model

Figure 2 for CoCoLM: COmplex COmmonsense Enhanced Language Model

Figure 3 for CoCoLM: COmplex COmmonsense Enhanced Language Model

Figure 4 for CoCoLM: COmplex COmmonsense Enhanced Language Model

Abstract:Large-scale pre-trained language models have demonstrated strong knowledge representation ability. However, recent studies suggest that even though these giant models contains rich simple commonsense knowledge (e.g., bird can fly and fish can swim.), they often struggle with the complex commonsense knowledge that involves multiple eventualities (verb-centric phrases, e.g., identifying the relationship between ``Jim yells at Bob'' and ``Bob is upset'').To address this problem, in this paper, we propose to help pre-trained language models better incorporate complex commonsense knowledge. Different from existing fine-tuning approaches, we do not focus on a specific task and propose a general language model named CoCoLM. Through the careful training over a large-scale eventuality knowledge graphs ASER, we successfully teach pre-trained language models (i.e., BERT and RoBERTa) rich complex commonsense knowledge among eventualities. Experiments on multiple downstream commonsense tasks that requires the correct understanding of eventualities demonstrate the effectiveness of CoCoLM.

Via

Access Paper or Ask Questions

When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models

Oct 10, 2020

Changlong Yu, Jialong Han, Peifeng Wang, Yangqiu Song, Hongming Zhang, Wilfred Ng, Shuming Shi

Figure 1 for When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models

Figure 2 for When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models

Figure 3 for When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models

Figure 4 for When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models

Abstract:We address hypernymy detection, i.e., whether an is-a relationship exists between words (x, y), with the help of large textual corpora. Most conventional approaches to this task have been categorized to be either pattern-based or distributional. Recent studies suggest that pattern-based ones are superior, if large-scale Hearst pairs are extracted and fed, with the sparsity of unseen (x, y) pairs relieved. However, they become invalid in some specific sparsity cases, where x or y is not involved in any pattern. For the first time, this paper quantifies the non-negligible existence of those specific cases. We also demonstrate that distributional methods are ideal to make up for pattern-based ones in such cases. We devise a complementary framework, under which a pattern-based and a distributional model collaborate seamlessly in cases which they each prefer. On several benchmark datasets, our framework achieves competitive improvements and the case study shows its better interpretability.

* Accepted by EMNLP2020 Main Conference

Via

Access Paper or Ask Questions

Enriching Large-Scale Eventuality Knowledge Graph with Entailment Relations

Jun 21, 2020

Changlong Yu, Hongming Zhang, Yangqiu Song, Wilfred Ng, Lifeng Shang

Figure 1 for Enriching Large-Scale Eventuality Knowledge Graph with Entailment Relations

Figure 2 for Enriching Large-Scale Eventuality Knowledge Graph with Entailment Relations

Figure 3 for Enriching Large-Scale Eventuality Knowledge Graph with Entailment Relations

Figure 4 for Enriching Large-Scale Eventuality Knowledge Graph with Entailment Relations

Abstract:Computational and cognitive studies suggest that the abstraction of eventualities (activities, states, and events) is crucial for humans to understand daily eventualities. In this paper, we propose a scalable approach to model the entailment relations between eventualities ("eat an apple'' entails ''eat fruit''). As a result, we construct a large-scale eventuality entailment graph (EEG), which has 10 million eventuality nodes and 103 million entailment edges. Detailed experiments and analysis demonstrate the effectiveness of the proposed approach and quality of the resulting knowledge graph. Our datasets and code are available at https://github.com/HKUST-KnowComp/ASER-EEG.

* Accepted by AKBC 2020

Via

Access Paper or Ask Questions

Multiplex Word Embeddings for Selectional Preference Acquisition

Jan 09, 2020

Hongming Zhang, Jiaxin Bai, Yan Song, Kun Xu, Changlong Yu, Yangqiu Song, Wilfred Ng, Dong Yu

Figure 1 for Multiplex Word Embeddings for Selectional Preference Acquisition

Figure 2 for Multiplex Word Embeddings for Selectional Preference Acquisition

Figure 3 for Multiplex Word Embeddings for Selectional Preference Acquisition

Figure 4 for Multiplex Word Embeddings for Selectional Preference Acquisition

Abstract:Conventional word embeddings represent words with fixed vectors, which are usually trained based on co-occurrence patterns among words. In doing so, however, the power of such representations is limited, where the same word might be functionalized separately under different syntactic relations. To address this limitation, one solution is to incorporate relational dependencies of different words into their embeddings. Therefore, in this paper, we propose a multiplex word embedding model, which can be easily extended according to various relations among words. As a result, each word has a center embedding to represent its overall semantics, and several relational embeddings to represent its relational dependencies. Compared to existing models, our model can effectively distinguish words with respect to different relations without introducing unnecessary sparseness. Moreover, to accommodate various relations, we use a small dimension for relational embeddings and our model is able to keep their effectiveness. Experiments on selectional preference acquisition and word similarity demonstrate the effectiveness of the proposed model, and a further study of scalability also proves that our embeddings only need 1/20 of the original embedding size to achieve better performance.

* emnlp-ijcnlp 2019

Via

Access Paper or Ask Questions