Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Menghan Zhang

On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe

Feb 26, 2024

Ningyu Xu, Qi Zhang, Menghan Zhang, Peng Qian, Xuanjing Huang

Abstract:Probing and enhancing large language models' reasoning capacity remains a crucial open question. Here we re-purpose the reverse dictionary task as a case study to probe LLMs' capacity for conceptual inference. We use in-context learning to guide the models to generate the term for an object concept implied in a linguistic description. Models robustly achieve high accuracy in this task, and their representation space encodes information about object categories and fine-grained features. Further experiments suggest that the conceptual inference ability as probed by the reverse-dictionary task predicts model's general reasoning performance across multiple benchmarks, despite similar syntactic generalization behaviors across models. Explorative analyses suggest that prompting LLMs with description$\Rightarrow$word examples may induce generalization beyond surface-level differences in task construals and facilitate models on broader commonsense reasoning problems.

* 21 pages, 13 figures

Via

Access Paper or Ask Questions

Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization

Oct 19, 2023

Ningyu Xu, Qi Zhang, Jingting Ye, Menghan Zhang, Xuanjing Huang

Figure 1 for Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization

Figure 2 for Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization

Figure 3 for Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization

Figure 4 for Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization

Abstract:Large language models (LLMs) have exhibited considerable cross-lingual generalization abilities, whereby they implicitly transfer knowledge across languages. However, the transfer is not equally successful for all languages, especially for low-resource ones, which poses an ongoing challenge. It is unclear whether we have reached the limits of implicit cross-lingual generalization and if explicit knowledge transfer is viable. In this paper, we investigate the potential for explicitly aligning conceptual correspondence between languages to enhance cross-lingual generalization. Using the syntactic aspect of language as a testbed, our analyses of 43 languages reveal a high degree of alignability among the spaces of structural concepts within each language for both encoder-only and decoder-only LLMs. We then propose a meta-learning-based method to learn to align conceptual spaces of different languages, which facilitates zero-shot and few-shot generalization in concept classification and also offers insights into the cross-lingual in-context learning phenomenon. Experiments on syntactic analysis tasks show that our approach achieves competitive results with state-of-the-art methods and narrows the performance gap between languages, particularly benefiting those with limited resources.

* Findings of EMNLP 2023

Via

Access Paper or Ask Questions

Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model

May 22, 2023

Xiao Wang, Weikang Zhou, Qi Zhang, Jie Zhou, Songyang Gao, Junzhe Wang, Menghan Zhang, Xiang Gao, Yunwen Chen, Tao Gui

Figure 1 for Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model

Figure 2 for Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model

Figure 3 for Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model

Figure 4 for Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model

Abstract:Pretrained language models have achieved remarkable success in various natural language processing tasks. However, pretraining has recently shifted toward larger models and larger data, and this has resulted in significant computational and energy costs. In this paper, we propose Influence Subset Selection (ISS) for language model, which explicitly utilizes end-task knowledge to select a tiny subset of the pretraining corpus. Specifically, the ISS selects the samples that will provide the most positive influence on the performance of the end-task. Furthermore, we design a gradient matching based influence estimation method, which can drastically reduce the computation time of influence. With only 0.45% of the data and a three-orders-of-magnitude lower computational cost, ISS outperformed pretrained models (e.g., RoBERTa) on eight datasets covering four domains.

* Accepted by ACL2023

Via

Access Paper or Ask Questions

Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is It and How Does It Affect Transfer?

Dec 21, 2022

Ningyu Xu, Tao Gui, Ruotian Ma, Qi Zhang, Jingting Ye, Menghan Zhang, Xuanjing Huang

Figure 1 for Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is It and How Does It Affect Transfer?

Figure 2 for Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is It and How Does It Affect Transfer?

Figure 3 for Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is It and How Does It Affect Transfer?

Figure 4 for Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is It and How Does It Affect Transfer?

Abstract:Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability, whereby it enables effective zero-shot cross-lingual transfer of syntactic knowledge. The transfer is more successful between some languages, but it is not well understood what leads to this variation and whether it fairly reflects difference between languages. In this work, we investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages. We demonstrate that the distance between the distributions of different languages is highly consistent with the syntactic difference in terms of linguistic formalisms. Such difference learnt via self-supervision plays a crucial role in the zero-shot transfer performance and can be predicted by variation in morphosyntactic properties between languages. These results suggest that mBERT properly encodes languages in a way consistent with linguistic diversity and provide insights into the mechanism of cross-lingual transfer.

* EMNLP 2022

Via

Access Paper or Ask Questions