Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hossein Nasr Esfahani

SS4MCT: A Statistical Stemmer for Morphologically Complex Texts

Jun 20, 2016

Javid Dadashkarimi, Hossein Nasr Esfahani, Heshaam Faili, Azadeh Shakery

Figure 1 for SS4MCT: A Statistical Stemmer for Morphologically Complex Texts

Figure 2 for SS4MCT: A Statistical Stemmer for Morphologically Complex Texts

Figure 3 for SS4MCT: A Statistical Stemmer for Morphologically Complex Texts

Figure 4 for SS4MCT: A Statistical Stemmer for Morphologically Complex Texts

Abstract:There have been multiple attempts to resolve various inflection matching problems in information retrieval. Stemming is a common approach to this end. Among many techniques for stemming, statistical stemming has been shown to be effective in a number of languages, particularly highly inflected languages. In this paper we propose a method for finding affixes in different positions of a word. Common statistical techniques heavily rely on string similarity in terms of prefix and suffix matching. Since infixes are common in irregular/informal inflections in morphologically complex texts, it is required to find infixes for stemming. In this paper we propose a method whose aim is to find statistical inflectional rules based on minimum edit distance table of word pairs and the likelihoods of the rules in a language. These rules are used to statistically stem words and can be used in different text mining tasks. Experimental results on CLEF 2008 and CLEF 2009 English-Persian CLIR tasks indicate that the proposed method significantly outperforms all the baselines in terms of MAP.

Via

Access Paper or Ask Questions