DETECTION OF AGGRESSIVE RHETORIC IN TEXT USING MACHINE LEARNING ALGORITHMS
Abstract
Automated news processing enables the classification and analysis of large volumes of data to detect fakes, aggressive rhetoric, and provocative calls for unauthorized actions. Cyber aggression has a negative effect, such as harming, threatening, or harassing of person. The importance of countering disinformation and ensuring information security has grown significantly since the beginning of the full-scale Russian invasion of Ukraine. Increasing volumes of news, messages, and tweets require choosing the optimal model of aggression detection in textual data to alleviate its impact on people’s lives.
The paper implements models of textual information classification containing aggressive vocabulary and emotional expressions. The dataset with more than 4,000 news in electronic media and social network items in Ukrainian and Russian languages was used in the development process. The development of aggression detection models took place in several stages: data pre-processing (tokenization, stemming, lemmatization, and stop words elimination); text vectorization using TF–IDF; implementation of machine learning and deep learning algorithms; comparisons of models using classification reports and confusion matrices.
The effectiveness of recognizing aggressive rhetoric in text messages using a naive Bayes classifier, support vector methods, k-nearest neighbors, random forest, logistic regression, and recurrent neural networks with LSTM and bidirectional LSTM architecture was studied. The classification models have been evaluated according to the accuracy, precision, recall, and F1-score metrics. It was established that the balance of the training sampling of textual data significantly affects the classification accuracy. More errors were made in the prediction of aggressively labeled text due to the predominance of non-aggressive specimens in the case of the unbalanced dataset. The accuracy of machine / deep learning models trained on balanced data reaches 81-85%. A linear correlation between fake news and aggressive rhetoric in information messages was found, which can be used as an additional feature for the classification of news materials.
Key words: computer text analysis, sentiment analysis, deep learning, supervised machine learning.
Full Text:
PDF (Українська)References
- Mansour S. Social Media Analysis of User’s Responses to terrorism using sentiment analysis and text mining // Procedia Computer Science. – 2018. – Vol. 140. – P. 95–103.
- Hao J., Dai H. Social Media Content and Sentiment Analysis on Consumer Security Breaches // Journal of Financial Crime. – 2016. – Vol. 23, No 4. – P. 855–869.
- Hosseinmardi H., Rafiq R.I., Han R., Lv Q., Mishra S. Prediction of cyberbullying incidents in a media-based social network // Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, CA, USA. – 2016. DOI: 10.1109/ASONAM.2016.7752233.
- Khan A., Baharudin B., Lee L.H., Khan K. A Review of Machine Learning Algorithms for Text-Documents Classification // Journal of Advances in Information Technology. – 2010. - Vol. 1. – P. 4–20.
- Luo X. Efficient English text classification using selected Machine Learning Techniques // Alexandria Engineering Journal. – 2021. – Vol. 60. – P. 3401–3409.
- Gasparetto A., Marcuzzo M., Zangari A., Albarelli A. A Survey on Text Classification Algorithms: From Text to Predictions // Information. – 2022. – Vol. 13. – P. 83.
- Hassan S.U., Ahamed J., Ahmad K. Analytics of machine learning-based algorithms for text classification // Sustainable Operations and Computers. – 2022. – Vol. 3. – P. 238–248.
- Kowsari K., Meimandi K.J., Heidarysafa M., Mendu S., Barnes L., Brown D. Text Classification Algorithms: A Survey // Information. – 2019. – Vol. 10. – P. 150.
- Raza S., Ding C. Fake news detection based on news content and social contexts: a transformer-based approach // Int. J. Data Sci. Anal. – 2022. – Vol. 13. – P. 335–362.
- Chopra F.K., Bhatia R. Sentiment Analyzing by Dictionary based Approach // International Journal of Computer Applications. – 2016. – Vol. 152, No.5. – P. 32–34.
- Оленич І., Притула М., Сінькевич О., Хамар О. Система автоматичного визначення тональності тексту // Електроніка та інформаційні технології. – 2021. – Випуск 15. – С. 16–23.
- Thelwall M., Buckley K., Paltoglou G., Kappas A., Cai D. Sentiment strength detection in short informal text // Journal of the American Society for Information Science and Technology. – 2010. – No. 61. – P. 2544–2558.
- Olenych I., Prytula M., Sinkevych O., Khamar O. System of Automatic Determination of Ukrainian Text Tone // IEEE 12th International Conference on Electronics and Information Technologies (ELIT). – 2021. – P. 80–83.
- Olenych I., Sinkevych O., Salamakha M., Prytula M. Text Tone Determination using Fuzzy Logic // Applied Computer Systems. – 2021. – Vol. 26 – P. 158–163.
- Khan U., Khan S., Rizwan A., Ghada A., Jamjoom M.M., Samee N.A. 2022. Aggression Detection in Social Media from Textual Data Using Deep Learning Models // Applied Sciences. – 2022. – Vol. 12, No. 10. – P. 5083.
- Robertson S. Understanding Inverse Document Frequency: On Theoretical Arguments for IDF // Journal of Documentation. – 2004. – Vol. 60, No. 5. – P. 503–520.
- Vijayarani S., Nithya M.N. Efficient machine learning classifiers for automatic information classification // Int. J. Mod. Trends Eng. Res. – 2015. – Vol. 2. – P. 685–694.
DOI: http://dx.doi.org/10.30970/eli.22.4
Refbacks
- There are currently no refbacks.