A novel flexible feature extraction algorithm for Spanish tweet sentiment analysis based on the context of words.
Date
2022Abstract
A tweet polarity classifier is presented with four categories: positive, neutral, negative and no opinion. A grouping genetic algorithm performs feature extraction on the reviews. The feature definition is based on entropy and semantic context described as the relative positions between words. The feature selection is flexible because it is customized to each word studied in the reviews. The algorithm has been applied with two corpuses written in Spanish, of 3,413 tweets and more than 63,000 tweets, to classify an evaluation set of 1,899 reviews. The results were evaluated by the metrics M F1 and accuracy. The algorithm has improved the results of both metrics and on both corpuses compared to the previous literature works, achieving a M F1 of 0.640 and an accuracy of 0.689. The flexibility property in feature extraction has been the major qualitative improvement of the classifier.