COMPARISON OF ZERO-SHOT APPROACH AND RETRIEVAL-AUGMENTED GENERATION FOR ANALYZING THE TONE OF COMMENTS IN THE UKRAINIAN LANGUAGE

M. Prytula; O. Sinkevych; Igor Olenych

doi:10.30970/eli.28.1

COMPARISON OF ZERO-SHOT APPROACH AND RETRIEVAL-AUGMENTED GENERATION FOR ANALYZING THE TONE OF COMMENTS IN THE UKRAINIAN LANGUAGE

M. Prytula, O. Sinkevych, Igor Olenych

Abstract

Background. The constant growth of information, online news and text messages in social networks causes new challenges for society. It requires robust tools for analyzing information in real-time, including determining its emotional tone. Understanding the emotional aspect directly affects customer satisfaction in various areas of activity and can suggest directions for improving processes. Therefore, the development of tools for analyzing the tonality of texts can provide the ability to accurately recognize people's emotions, identify problems, and determine ways to solve them.

Methods. In this study, approaches to the application of the Mistral-7B-UK large language model were implemented for the text tone analysis. Two datasets of comments in the Ukrainian language were utilized: one for binary classification, divided into negative and positive classes, and another for multiclass classification which included a neutral tonality. These datasets contain reviews about shops, restaurants, hotels, medical facilities, entertainment centers, fitness clubs, the provision of various services, etc.

Results and Discussion. The prompts were constructed for the zero-shot approach, describing the role, output format, and additional explanation about tonalities. To implement RAG, Qdrant was utilized as a vector database, while the LangChain library enabled the integration of a large language model with external data sources. To determine text tonality, the five most semantically similar chunks with the defined tonality are retrieved from the vector database, and predefined placeholders are filled in the prompt template. The model's response is generated using the provided context.

Conclusion. Research showed that the zero-shot approach achieves higher text tone analysis accuracy than the Retrieval-Augmented Generation model. For binary classification, the overall accuracy was 94 %, and for multiclass – 75 %. The benefit of using external sources was found during the model's recognition of neutral tonality. However, it was observed that comments with opposing tonality could be retrieved as context due to the shared object of description, which negatively affects results.

Keywords: text tone, Large Language Model, zero-shot, Retrieval-Augmented generation.

Full Text:

PDF

References

Nwosu, N. T., Babatunde, S. O., & Ijomah, T. (2024). Enhancing customer experience and market penetration through advanced data analytics in the health industry. World Journal of Advanced Research and Reviews, 22(3), 1157–1170. https://doi.org/10.30574/wjarr.2024.22.3.1810
Idris, S. L., & Mohamad, M. (2023). A Study on Sentiment Analysis on Airline Quality Services: A Conceptual Paper. Information Management and Business Review, 15(4), 564–576. https://doi.org/10.22610/imbr.v15i4(SI)I.3638
Rodríguez-Ibánez, M., Casánez-Ventura, A., Castejón-Mateos, F., Cuenca-Jiménez, P. M. (2023). A review on sentiment analysis from social media platforms. Expert Systems with Applications, 223, 119862. https://doi.org/10.1016/j.eswa.2023.119862
Adeniran, I. A., Abhulimen, A. O., Obiki-Osafiele, A. N., Osundare, O. S., Agu, E. E., & Efunniyi, C. P. (2024). Data-Driven approaches to improve customer experience in banking: Techniques and outcomes. International Journal of Management & Entrepreneurship Research, 6(8), 2797–2818. https://doi.org/10.51594/ijmer.v6i8.1467
Geetha, M. P., & Karthika Renuka, D. (2021). Improving the performance of aspect based sentiment analysis using fine-tuned Bert Base Uncased model. International Journal of Intelligent Networks, 2, 64–69. https://doi.org/10.1016/j.ijin.2021.06.005
Prytula, M. (2024). Fine-tuning BERT, DistilBERT, XLM-RoBERTa and Ukr-RoBERTa models for sentiment analysis of Ukrainian language reviews. Artificial Intelligence, 3(4), 85–97. https://doi.org/10.15407/jai2024.02.085
Krugmann, J.O., & Hartmann, J. (2024). Sentiment Analysis in the Age of Generative AI. Customer Needs and Solutions, 11(1), 3. https://doi.org/10.1007/s40547-024-00143-4
Bosley, M., Jacobs-Harukawa, M., Licht, H., & Hoyle, A. (2023). Do we still need BERT in the age of GPT? Comparing the benefits of domain-adaptation and in-context-learning approaches to using LLMs for Political Science Research. https://mbosley.github.io/papers/bosley_harukawa_licht_hoyle_mpsa2023.pdf
Edwards, A., & Camacho-Collados, J. (2024) Language Models for Text Classification: Is In-Context Learning Enough? https://doi.org/10.48550/arXiv.2403.17661
Mialon, G., Dessì, R., Lomeli, M., Nalmpantis, C., Pasunuru, R., Raileanu, R., Rozière, B., Schick, T., Dwivedi-Yu, J., Celikyilmaz, A. & Grave, E. (2023). Augmented language models: a survey. https://doi.org/10.48550/arXiv.2302.07842
Şakar, T., & Emekci, H. (2024). Maximizing RAG efficiency: A comparative analysis of RAG methods. Natural Language Processing, 1–25. https://doi.org/10.1017/nlp.2024.53
Mansurova, A., Mansurova, A., & Nugumanova, A. (2024). QA-RAG: Exploring LLM Reliance on External Knowledge. Big Data and Cognitive Computing, 8(9), 115–131. https://doi.org/10.3390/bdcc8090115
Zhang, B., Yang, H., Zhou, T., Ali Babar, M., & Liu, X. Y. (2023). Enhancing financial sentiment analysis via retrieval augmented large language models. In Proceedings of the fourth ACM international conference on AI in finance (p. 349–356). https://doi.org/10.1145/3604237.362686
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E, Quoc V. Le, & Zhou D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35, 24824–24837. https://doi.org/10.48550/arXiv.2201.11903
Guo, Z., Cheng, S., Wang, Y., Li, P., & Liu, Y. (2023). Prompt-guided retrieval augmentation for non-knowledge-intensive tasks. https://doi.org/10.48550/arXiv.2305.17653
Hamotskyi, S., Levbarg, A. I., & Hänig, C. (2024). Eval-UA-tion 1.0: Benchmark for Evaluating Ukrainian (Large) Language Models. In Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP)@ LREC-COLING (p. 109–119). https://aclanthology.org/2024.unlp-1.13
Boroş, T., Chivereanu, R., Dumitrescu, S., & Purcaru, O. (2024). Fine-Tuning and Retrieval Augmented Generation for Question Answering Using Affordable Large Language Models. In Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP)@ LREC-COLING (p. 75–82). https://aclanthology.org/2024.unlp-1.10
Wang, X., Wang, Z., Gao, X., Zhang, F., Wu, Y., Xu, Z., Shi, T., Wang, Z., Li, S., Qian, Q., & Yin, R. (2024). Searching for best practices in retrieval-augmented generation. https://doi.org/10.48550/arXiv.2407.01219
Taipalus, T. (2024). Vector database management systems: Fundamental concepts, use-cases, and current challenges. Cognitive Systems Research, 85(7), 101216–101224. https://doi.org/10.1016/j.cogsys.2024.101216

DOI: http://dx.doi.org/10.30970/eli.28.1

Refbacks

There are currently no refbacks.

Username
Password
Remember me

Electronics and information technologies / Електроніка та інформаційні технології

COMPARISON OF ZERO-SHOT APPROACH AND RETRIEVAL-AUGMENTED GENERATION FOR ANALYZING THE TONE OF COMMENTS IN THE UKRAINIAN LANGUAGE

Abstract

Full Text:

References

Refbacks