Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Domain and Language Independent Feature Extraction for Statistical Text Categorization

Jul 02, 1996

Thomas Bayer, Ingrid Renz, Michael Stein, Ulrich Kressel

Figure 1 for Domain and Language Independent Feature Extraction for Statistical Text Categorization

Figure 2 for Domain and Language Independent Feature Extraction for Statistical Text Categorization

Figure 3 for Domain and Language Independent Feature Extraction for Statistical Text Categorization

Figure 4 for Domain and Language Independent Feature Extraction for Statistical Text Categorization

Share this with someone who'll enjoy it:

Abstract:A generic system for text categorization is presented which uses a representative text corpus to adapt the processing steps: feature extraction, dimension reduction, and classification. Feature extraction automatically learns features from the corpus by reducing actual word forms using statistical information of the corpus and general linguistic knowledge. The dimension of feature vector is then reduced by linear transformation keeping the essential information. The classification principle is a minimum least square approach based on polynomials. The described system can be readily adapted to new domains or new languages. In application, the system is reliable, fast, and processes completely automatically. It is shown that the text categorizer works successfully both on text generated by document image analysis - DIA and on ground truth data.

* proceedings of workshop on language engineering for document analysis and recognition - ed. by L. Evett and T. Rose, part of the AISB 1996 Workshop Series, April 96, Sussex University, England, 21-32 (ISBN 0 905 488628) * 12 pages, TeX file, 9 Postscript figures, uses epsf.sty

View paper on

Share this with someone who'll enjoy it:

Title:Domain and Language Independent Feature Extraction for Statistical Text Categorization

Paper and Code