Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pattarawat Chormai

PyThaiNLP: Thai Natural Language Processing in Python

Dec 07, 2023

Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, Pattarawat Chormai, Peerat Limkonchotiwat, Thanathip Suntorntip, Can Udomcharoenchaikit

Figure 1 for PyThaiNLP: Thai Natural Language Processing in Python

Figure 2 for PyThaiNLP: Thai Natural Language Processing in Python

Figure 3 for PyThaiNLP: Thai Natural Language Processing in Python

Abstract:We present PyThaiNLP, a free and open-source natural language processing (NLP) library for Thai language implemented in Python. It provides a wide range of software, models, and datasets for Thai language. We first provide a brief historical context of tools for Thai language prior to the development of PyThaiNLP. We then outline the functionalities it provided as well as datasets and pre-trained language models. We later summarize its development milestones and discuss our experience during its development. We conclude by demonstrating how industrial and research communities utilize PyThaiNLP in their work. The library is freely available at https://github.com/pythainlp/pythainlp.

* 12 pages, 2 figures, LaTeX; typos corrected, timeline clarified for section 2. In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 25-36, Singapore, Singapore. Empirical Methods in Natural Language Processing

Via

Access Paper or Ask Questions

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

Dec 30, 2022

Pattarawat Chormai, Jan Herrmann, Klaus-Robert Müller, Grégoire Montavon

Figure 1 for Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

Figure 2 for Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

Figure 3 for Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

Figure 4 for Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

Abstract:Explainable AI transforms opaque decision strategies of ML models into explanations that are interpretable by the user, for example, identifying the contribution of each input feature to the prediction at hand. Such explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by finding relevant subspaces in activation space that can be mapped to more abstract human-understandable concepts and enable a joint attribution on concepts and input features. To automatically extract the desired representation, we propose new subspace analysis formulations that extend the principle of PCA and subspace analysis to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), optimize relevance of projected activations rather than the more traditional variance or kurtosis. This enables a much stronger focus on subspaces that are truly relevant for the prediction and the explanation, in particular, ignoring activations or concepts to which the prediction model is invariant. Our approach is general enough to work alongside common attribution techniques such as Shapley Value, Integrated Gradients, or LRP. Our proposed methods show to be practically useful and compare favorably to the state of the art as demonstrated on benchmarks and three use cases.

* 16 pages + supplement

Via

Access Paper or Ask Questions

AttaCut: A Fast and Accurate Neural Thai Word Segmenter

Nov 16, 2019

Pattarawat Chormai, Ponrawee Prasertsom, Attapol Rutherford

Figure 1 for AttaCut: A Fast and Accurate Neural Thai Word Segmenter

Figure 2 for AttaCut: A Fast and Accurate Neural Thai Word Segmenter

Figure 3 for AttaCut: A Fast and Accurate Neural Thai Word Segmenter

Figure 4 for AttaCut: A Fast and Accurate Neural Thai Word Segmenter

Abstract:Word segmentation is a fundamental pre-processing step for Thai Natural Language Processing. The current off-the-shelf solutions are not benchmarked consistently, so it is difficult to compare their trade-offs. We conducted a speed and accuracy comparison of the popular systems on three different domains and found that the state-of-the-art deep learning system is slow and moreover does not use sub-word structures to guide the model. Here, we propose a fast and accurate neural Thai Word Segmenter that uses dilated CNN filters to capture the environment of each character and uses syllable embeddings as features. Our system runs at least 5.6x faster and outperforms the previous state-of-the-art system on some domains. In addition, we develop the first ML-based Thai orthographical syllable segmenter, which yields syllable embeddings to be used as features by the word segmenter.

* 14 pages, 7 figures, accepted as oral presentation at New in ML Workshop, NeurIPS 2019

Via

Access Paper or Ask Questions