In order to perform Sentiment Classification in scenarios where there is availability of huge amounts of unlabelled data (as in Tweets and other big data applications), human annotators are required to label the data, which is very expensive and time consuming. This aspect is resolved by adopting the Active Learning approach to create labelled data from the available unlabelled data by actively choosing the most appropriate or most informative instances in a greedy manner, and then submitting to human annotator for annotation. Active learning (AL) thus reduces the time, cost and effort to label huge amount of unlabelled data. The AL provides improved performance over passive learning by reducing the amount of data to be used for learning; producing higher quality labelled data; reducing the running time of the classification process; and improving the predictive accuracy. Different Query Strategies have been proposed for choosing the most informative instances out of the unlabelled data. In this work, we have performed a comparative performance evaluation of Sentiment Classification in a Pool based Active Learning scenario adopting the query strategies—Entropy Sampling Query Strategy in Uncertainty Sampling, Kullback-Leibler divergence and Vote Entropy in Query By Committee using the evaluation metrics Accuracy, Weighted Precision, Weighted Recall, Weighted F-measure, Root Mean Square Error, Weighted True Positive Rate and Weighted False Positive Rate. We have also calculated different time measures in an Active Learning process viz. Accumulative Iteration time, Iteration time, Training time, Instances selection time and Test time. The empirical results reveal that Uncertainty Sampling query strategy showed better overall performance than Query By Committee in the Sentiment Classification of movie reviews dataset. © Springer Science+Business Media Singapore 2016.
L. K. Devi, Subathra P., and Dr. (Col.) Kumar P. N., “Performance evaluation of sentiment classification using query strategies in a pool based active learning scenario”, Advances in Intelligent Systems and Computing, vol. 412, pp. 65-75, 2016.