Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Maria Khodorchenko

AutoTM 2.0: Automatic Topic Modeling Framework for Documents Analysis

Oct 01, 2024

Maria Khodorchenko, Nikolay Butakov, Maxim Zuev, Denis Nasonov

Abstract:In this work, we present an AutoTM 2.0 framework for optimizing additively regularized topic models. Comparing to the previous version, this version includes such valuable improvements as novel optimization pipeline, LLM-based quality metrics and distributed mode. AutoTM 2.0 is a comfort tool for specialists as well as non-specialists to work with text documents to conduct exploratory data analysis or to perform clustering task on interpretable set of features. Quality evaluation is based on specially developed metrics such as coherence and gpt-4-based approaches. Researchers and practitioners can easily integrate new optimization algorithms and adapt novel metrics to enhance modeling quality and extend their experiments. We show that AutoTM 2.0 achieves better performance compared to the previous AutoTM by providing results on 5 datasets with different features and in two different languages.

Via

Access Paper or Ask Questions

Unsupervised Neural Aspect Search with Related Terms Extraction

May 06, 2020

Timur Sokhin, Maria Khodorchenko, Nikolay Butakov

Figure 1 for Unsupervised Neural Aspect Search with Related Terms Extraction

Figure 2 for Unsupervised Neural Aspect Search with Related Terms Extraction

Figure 3 for Unsupervised Neural Aspect Search with Related Terms Extraction

Figure 4 for Unsupervised Neural Aspect Search with Related Terms Extraction

Abstract:The tasks of aspect identification and term extraction remain challenging in natural language processing. While supervised methods achieve high scores, it is hard to use them in real-world applications due to the lack of labelled datasets. Unsupervised approaches outperform these methods on several tasks, but it is still a challenge to extract both an aspect and a corresponding term, particularly in the multi-aspect setting. In this work, we present a novel unsupervised neural network with convolutional multi-attention mechanism, that allows extracting pairs (aspect, term) simultaneously, and demonstrate the effectiveness on the real-world dataset. We apply a special loss aimed to improve the quality of multi-aspect extraction. The experimental study demonstrates, what with this loss we increase the precision not only on this joint setting but also on aspect prediction only.

Via

Access Paper or Ask Questions