Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tien-Yuan Huang

Refining Dimensions for Improving Clustering-based Cross-lingual Topic Models

Dec 17, 2024

Chia-Hsuan Chang, Tien-Yuan Huang, Yi-Hang Tsai, Chia-Ming Chang, San-Yih Hwang

Figure 1 for Refining Dimensions for Improving Clustering-based Cross-lingual Topic Models

Figure 2 for Refining Dimensions for Improving Clustering-based Cross-lingual Topic Models

Figure 3 for Refining Dimensions for Improving Clustering-based Cross-lingual Topic Models

Figure 4 for Refining Dimensions for Improving Clustering-based Cross-lingual Topic Models

Abstract:Recent works in clustering-based topic models perform well in monolingual topic identification by introducing a pipeline to cluster the contextualized representations. However, the pipeline is suboptimal in identifying topics across languages due to the presence of language-dependent dimensions (LDDs) generated by multilingual language models. To address this issue, we introduce a novel, SVD-based dimension refinement component into the pipeline of the clustering-based topic model. This component effectively neutralizes the negative impact of LDDs, enabling the model to accurately identify topics across languages. Our experiments on three datasets demonstrate that the updated pipeline with the dimension refinement component generally outperforms other state-of-the-art cross-lingual topic models.

* Accepted to 18th BUCC Workshop at COLING 2025

Via

Access Paper or Ask Questions