Abstract:This paper describes neural network based approaches to the process of the formation and splitting of word-compounding, respectively known as the Sandhi and Vichchhed, in Sanskrit language. Sandhi is an important idea essential to morphological analysis of Sanskrit texts. Sandhi leads to word transformations at word boundaries. The rules of Sandhi formation are well defined but complex, sometimes optional and in some cases, require knowledge about the nature of the words being compounded. Sandhi split or Vichchhed is an even more difficult task given its non uniqueness and context dependence. In this work, we propose the route of formulating the problem as a sequence to sequence prediction task, using modern deep learning techniques. Being the first fully data driven technique, we demonstrate that our model has an accuracy better than the existing methods on multiple standard datasets, despite not using any additional lexical or morphological resources. The code is being made available at https://github.com/IITD-DataScience/Sandhi_Prakarana
Abstract:This paper presents first benchmark corpus of Sanskrit Pratyaya (suffix) and inflectional words (padas) formed due to suffixes along with neural network based approaches to process the formation and splitting of inflectional words. Inflectional words spans the primary and secondary derivative nouns as the scope of current work. Pratyayas are an important dimension of morphological analysis of Sanskrit texts. There have been Sanskrit Computational Linguistics tools for processing and analyzing Sanskrit texts. Unfortunately there has not been any work to standardize & validate these tools specifically for derivative nouns analysis. In this work, we prepared a Sanskrit suffix benchmark corpus called Pratyaya-Kosh to evaluate the performance of tools. We also present our own neural approach for derivative nouns analysis while evaluating the same on most prominent Sanskrit Morphological Analysis tools. This benchmark will be freely dedicated and available to researchers worldwide and we hope it will motivate all to improve morphological analysis in Sanskrit Language.