Keyword Extraction Performance Analysis

Abstract

This paper presents a survey-cum-evaluation of methods for the comprehensive comparison of the task of keyword extraction using datasets of various sizes, forms, and genre. We use four different datasets which includes Amazon product data - Automotive, SemEval 2010, TMDB and Stack Exchange. Moreover, a subset of 100 Amazon product reviews is annotated and utilized for evaluation in this paper, to our knowledge, for the first time. Datasets are evaluated by five Natural Language Processing approaches (3 unsupervised and 2 supervised), which include TF-IDF, RAKE, TextRank, LDA and Shallow Neural Network. We use a ten-fold cross-validation scheme and evaluate the performance of the aforementioned approaches using recall, precision and F-score. Our analysis and results provide guidelines on the proper approaches to use for different types of datasets. Furthermore, our results indicate that certain approaches achieve improved performance with certain datasets due to inherent characteristics of the data.

Publication
2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)
Mayuresh Savargaonkar
Mayuresh Savargaonkar
Ph.D.

My research interests include, verification and validation of modern systems, electric vehicle charging infrastructure, Li-ion battery prognostics using customized deep learning, and explainable AI.