Evaluation of Text Classifier Based on Different Stemming Algorithms
Ebtehal Talib Kudair
Ministry of Higher Education and Scientific Research, Iraq
Download PDFAbstract
Text classification is an important field of machine learning, is a supervised learning method and it depends on dividing texts into groups according to the predefined categories. In general, the text carries a lot of information but in an unstructured form, and this unstructured datamust be converted into structured data.In this paper, texts will be classified using the traditional k-Nearest Neighbor algorithm (KNN), and the performance of the KNN classification algorithm will be compared through text preprocessing with the use of different stemming algorithms such as (Porter Stemmer, Snowball Stemmer). The snowball stemmer reduced the number of featuresin comparisonwith porterstemmer, thusthe results proved that the classifier are more accurate when using snowball stemmer.
Keywords: Text Classification; k-Nearest Neighbor algorithm; Stemming
- Bijalwan, Vishwanath & Kumar, Vinay & Kumari, Pinki and Pascual, Jordan (2014) “ KNN based Machine Learning Approach for Text and Document Mining” International Journal of Database Theory and Application Vol.7, No.1.
- Jivani, Anjali Ganesh (2016) “A Comparative Study of Stemming Algorithms” Int. J. Comp. Tech. Appl., Vol 2 (6), 1930-1938.
- Kannan, S. and Gurusamy, Vairaprakash (2015) “PreprocessingTechniques for Text Mining” Conference Paper.
- Korde, Vandana and Mahender, C. Namrata (2012) “Text Classification and Classifiers: A Survey” International Journal of Artificial Intelligence & Applications (IJAIA), Vol.3, No.2, March.
- Miah, Muhammed “Improved k-NN Algorithm for Text Classification”, Department of Computer Science and Engineering University of Texas at Arlington, TX, USA.
- Qaiser, Shahzad and Ali, Ramsha (2018) “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents” International Journal of Computer Applications (0975 – 8887) Volume 181 – No.1, July.
- Sulaiman, M.N and Hossin, M. (2015) “A Review on Evaluation Metrics for Data Classification Evaluations” International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.2, March
- Venkatesh, B. and Anuradha, J. (2019) “A Review of Feature Selection and Its Methods” Cybernetics and Information Technologies, Volume 19, No 1.