A novel flexible feature extraction algorithm for Spanish tweet sentiment analysis based on the context of words.

Sánchez Berriel, Isabel; García Díaz, Pilar; Pontiel Martín, Diego; González Ávila, José Luis

doi:10.1016/j.eswa.2022.118817

View/Open

Export Citations

Date

2022

URI

http://riull.ull.es/xmlui/handle/915/36225

Abstract

A tweet polarity classifier is presented with four categories: positive, neutral, negative and no opinion. A grouping genetic algorithm performs feature extraction on the reviews. The feature definition is based on entropy and semantic context described as the relative positions between words. The feature selection is flexible because it is customized to each word studied in the reviews. The algorithm has been applied with two corpuses written in Spanish, of 3,413 tweets and more than 63,000 tweets, to classify an evaluation set of 1,899 reviews. The results were evaluated by the metrics M F1 and accuracy. The algorithm has improved the results of both metrics and on both corpuses compared to the previous literature works, achieving a M F1 of 0.640 and an accuracy of 0.689. The flexibility property in feature extraction has been the major qualitative improvement of the classifier.

Licencia Creative Commons (Reconocimiento-No comercial-Sin obras derivadas 4.0 Internacional)

Except where otherwise noted, this item's license is described as Licencia Creative Commons (Reconocimiento-No comercial-Sin obras derivadas 4.0 Internacional)