EXPLORATORY DATA ANALYSIS POSSIBILITY IN THE MEANING SPACE USING LARGE LANGUAGE MODELS

L. Lyashkevych, V. Lyashkevych, Roman Shuvar

Abstract


Today, data analysis plays a key role in studying and using large amounts of information to make informed decisions. Thus data is one of the most valuable resources and analytical tools are becoming increasingly important. Data analysis allows us to identify patterns and relationships between different factors. As for the methods of analysis, they are very diverse and depend on the specific task and the nature of the data, each method has its advantages and limitations, and the choice of a particular one depends on the context.

Recent investigations have proved that LLM showed good results in textual information analysis and might be one of the textual analytical tools. Therefore, the main focus of our research is centred around the ability of LLM to perform data interpretation operations, arithmetic and statistical operations in the meaning space. In the evaluation of the proposed concepts, simple cases were considered. It was enough to better understand the effectiveness of LLM as one of the tools for exploratory data analysis.

The practical results of the research indicate that the concept has some advantages over the closest analogues, as well as identify several scientific problems that can be solved in subsequent studies. Additionally, research tools were developed as a chatbot system in the Telegram environment.

Keywords: exploratory data analysis, prompt engineering, large language models, chatbot systems, “meaning” space.


Full Text:

PDF

References


  1. Jianqing Fan, Fang Han, Han Liu. Challenges of Big Data analysis: National Science Review, June 2014. - V. 1. - I. 2. - pp. 293–314. - DOI: 10.1093/nsr/nwt032
  2. Pranay Ahlawat, Justin Borgman, Samuel Eden, Steven Huels, Jess Iandiorio, Amit Kumar, and Philip Zakahi. A New Architecture to Manage Data Costs and Complexity: BCG, February 7, 2023. - URL: https://www.bcg.com/publications/2023/new-data-architectures-can-help-manage-data-costs-and-complexity?linkId=200819392
  3. Exploritary data analysis: Wikipedia. - URL: https://en.wikipedia.org/wiki/Exploratory_data_analysis
  4. Chris Chatfield. Exploratory data analysis / European Journal of Operational Research, 1986. - V. 23, I. 1, pp. 5-13. - DOI: 10.1016/0377-2217(86)90209-2
  5. Peter Bickel. Discussion on the paper “Sure independence screening for ultrahigh dimensional feature space” by Fan and Lv. J. Roy. Statist. Soc. Ser. B, 70(5):883–884, 2008
  6. Kanit Wongsuphasawat, Yang Liu, Jeffrey Heer. Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study, 1 Nov 2019. - 10 p. - arXiv: https://arxiv.org/pdf/1911.00568.pdf
  7. Jessica Hullman, Andrew Gelman. Challenges in Incorporating Exploratory Data Analysis Into Statistical Workflow: Harvard Data Science Review, 2021. - V.3. - I. 3. - 11 p. - DOI: 10.1162/99608f92.9d108ee6
  8. Mehta V, Batra N, Poonam, Goyal S, Kaur A, Dudekula KV, Victor GJ. Machine Learning based Exploratory Data Analysis (EDA) and Diagnosis of Chronic Kidney Disease (CKD): EAI Endorsed Transactions on Pervasive Health and Technology, 2024. - 8 p. - DOI: 10.4108/eetpht.10.5512
  9. Da Poian V, Theiling B, Clough L, McKinney B, Major J, Chen J and Hörst S. Exploratory data analysis (EDA) machine learning approaches for ocean world analog-mass spectrometry, 2023. - 17 p. - DOI: 10.3389/fspas.2023.1134141
  10. Inigo Martinez, Elisabeth Vilesb, Igor G Olaizolaa. Data Science Methodologies: Current Challenges and Future Approaches, Jan 2022. - 22 p. - arXiv: https://arxiv.org/pdf/2106.07287.pdf
  11. Christoph Schocka, Jonas Dumlerb, Prof. Dr.-Ing. Frank Doeppera. Data Acquisition and Preparation – Enabling Data Analytics Projects within Production // 54th CIRP Conference on Manufacturing Systems: Procedia CIRP, 2021. - V. 104, pp. 636-640. - DOI: 10.1016/j.procir.2021.11.107
  12. F. Martínez-Plumed et al., "CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories," in IEEE Transactions on Knowledge and Data Engineering, 1 Aug 2021. - V. 33. - N 8. - pp.3048-3061, - DOI: 10.1109/TKDE.2019.2962680
  13. N. Bratchell. Cluster analysis: Chemometrics and Intelligent Laboratory Systems, 1989. - V. 6. - I. 2. - pp. 105-125. - DOI: 10.1016/0169-7439(87)80054-0
  14. Brady Lund, Jinxuan Ma. A review of cluster analysis techniques and their uses in library and information science research: k-means and k-medoids clustering. Performance Measurement and Metrics, 2021. - N 22. - pp.161-173. - DOI: 10.1108/PMM-05-2021-0026
  15. Mugavin, Marie. (2008). Multidimensional Scaling: A Brief Overview. Nursing research. 57. 64-8. - DOI: 10.1097/01.NNR.0000280659.88760.7c
  16. Chon Ho, Yu. (2010). Exploratory data analysis in the context of data mining and resampling. International Journal of Psychological Research, 3(1), 9-22. - URL: https://www.redalyc.org/pdf/2990/299023509014.pdf
  17. Miguel Ângelo Lellis Moreira, Claudio de Souza Rocha Junior, Diogo Ferreira de Lima Silva and others. Exploratory analysis and implementation of machine learning techniques for predictive assessment of fraud in banking systems: Procedia Computer Science, 2022. - V. 214. - pp. 117-124. - DOI: 10.1016/j.procs.2022.11.156
  18. Tova Milo, Amit Somech. SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, June 2020. - pp.2617-2622. - DOI: 10.1145/3318464.3383126
  19. Tabassum, Lubna. (2020). Fundamentals of artificial intelligence and deep learning techniques. 2020
  20. Hend A. Selmy, Hoda K. Mohamed, Walaa Medhat. Big data analytics deep learning techniques and applications: A survey: Information Systems, 2024. - V.120. - DOI: 10.1016/j.is.2023.102318
  21. Sarker, I.H. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN COMPUT. SCI. 2, 420 (2021). - DOI: 10.1007/s42979-021-00815-1
  22. Najafabadi, Maryam & Villanustre, Flavio & Khoshgoftaar, Taghi & Seliya, Naeem & Wald, Randall & Muharemagic, Edin. (2016). Deep Learning Techniques in Big Data Analytics. - DOI: 10.1007/978-3-319-44550-2_5
  23. Parishad Behnam Ghader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, Siva Reddy. LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders, 9 Apr 2024. - arXiv: https://arxiv.org/pdf/2404.05961v1.pdf
  24. Mohamed Nejjar, Luca Zacharias, Fabian Stiehle, Ingo Weber. LLMs for Science: Usage for Code Generation and Data Analysis, 7 Dec 2023. - arXiv: https://arxiv.org/pdf/2311.16733.pdf
  25. Tai, R. H., Bentley, L. R., Xia, X., Sitt, J. M., Fankhauser, S. C., Chicas-Mosier, A. M., & Monteith, B. G. (2024). An examination of the Use of Large Language Models to Aid Analysis of Textual Data. International Journal of Qualitative Methods, 23. - DOI: 10.1177/16094069241231168
  26. Jacqueline A Jansen, Artür Manukyan, Nour Al Khoury, Altuna Akalin. Leveraging large language models for data analysis automation, 2024. - 18 p. - DOI: 10.1101/2023.12.11.571140
  27. Lokazyuk V. Software development for texts with diagnostic information processing [Electronic resource] / V. Lokazyuk, V. Lyashkevych, O. Olar // Radio electronic and computer systems. - 2007. - № 6. - P. 123–129. - URL: http://nbuv.gov.ua/UJRN/recs_2007_6_25
  28. Fernandes, A.A.A., Koehler, M., Konstantinou, N. et al. Data Preparation: A Technological Perspective and Review. SN COMPUT. SCI. 4, 425 (2023). - DOI: 10.1007/s42979-023-01828-8
  29. Preprint of Atzmueller, M., Schmidt, A., Hollender, M. (2016) Data Preparation for Big Data Analytics: Methods & Experiences. In: Enterprise Big Data Engineering, Analytics, and Management, IGI Global.
  30. Costello, Tim & Blackshear, Lori. Prepare Your Data for Tableau: A Practical Guide to the Tableau Data Prep Tool, 2020. - DOI: 10.1007/978-1-4842-5497-4
  31. Sankar, Reshmi. (2018). EMPOWERING CHATBOTS WITH BUSINESS INTELLIGENCE BY BIG DATA INTEGRATION. International Journal of Advanced Research in Computer Science. 9. 627-631. - DOI: 10.26483/ijarcs.v9i1.5398
  32. Hamed Khosravi, Mohammad Reza Shafie, Morteza Hajiabadi, Ahmed Shoyeb Raihan, Imtiaz Ahmed. Chatbots and ChatGPT: A Bibliometric Analysis and Systematic Review of Publications in Web of Science and Scopus Databases: arXiv, 2023. - 30 p. - arXiv: https://arxiv.org/pdf/2304.05436.pdf




DOI: http://dx.doi.org/10.30970/eli.25.9

Refbacks

  • There are currently no refbacks.