EXPLORATORY DATA ANALYSIS POSSIBILITY IN THE MEANING SPACE USING LARGE LANGUAGE MODELS
Abstract
Today, data analysis plays a key role in studying and using large amounts of information to make informed decisions. Thus data is one of the most valuable resources and analytical tools are becoming increasingly important. Data analysis allows us to identify patterns and relationships between different factors. As for the methods of analysis, they are very diverse and depend on the specific task and the nature of the data, each method has its advantages and limitations, and the choice of a particular one depends on the context.
Recent investigations have proved that LLM showed good results in textual information analysis and might be one of the textual analytical tools. Therefore, the main focus of our research is centred around the ability of LLM to perform data interpretation operations, arithmetic and statistical operations in the meaning space. In the evaluation of the proposed concepts, simple cases were considered. It was enough to better understand the effectiveness of LLM as one of the tools for exploratory data analysis.
The practical results of the research indicate that the concept has some advantages over the closest analogues, as well as identify several scientific problems that can be solved in subsequent studies. Additionally, research tools were developed as a chatbot system in the Telegram environment.
Keywords: exploratory data analysis, prompt engineering, large language models, chatbot systems, “meaning” space.
Full Text:
PDFReferences
- Jianqing Fan, Fang Han, Han Liu. Challenges of Big Data analysis: National Science Review, June 2014. - V. 1. - I. 2. - pp. 293–314. - DOI: 10.1093/nsr/nwt032
- Pranay Ahlawat, Justin Borgman, Samuel Eden, Steven Huels, Jess Iandiorio, Amit Kumar, and Philip Zakahi. A New Architecture to Manage Data Costs and Complexity: BCG, February 7, 2023. - URL: https://www.bcg.com/publications/2023/new-data-architectures-can-help-manage-data-costs-and-complexity?linkId=200819392
- Exploritary data analysis: Wikipedia. - URL: https://en.wikipedia.org/wiki/Exploratory_data_analysis
- Chris Chatfield. Exploratory data analysis / European Journal of Operational Research, 1986. - V. 23, I. 1, pp. 5-13. - DOI: 10.1016/0377-2217(86)90209-2
- Peter Bickel. Discussion on the paper “Sure independence screening for ultrahigh dimensional feature space” by Fan and Lv. J. Roy. Statist. Soc. Ser. B, 70(5):883–884, 2008
- Kanit Wongsuphasawat, Yang Liu, Jeffrey Heer. Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study, 1 Nov 2019. - 10 p. - arXiv: https://arxiv.org/pdf/1911.00568.pdf
- Jessica Hullman, Andrew Gelman. Challenges in Incorporating Exploratory Data Analysis Into Statistical Workflow: Harvard Data Science Review, 2021. - V.3. - I. 3. - 11 p. - DOI: 10.1162/99608f92.9d108ee6
- Mehta V, Batra N, Poonam, Goyal S, Kaur A, Dudekula KV, Victor GJ. Machine Learning based Exploratory Data Analysis (EDA) and Diagnosis of Chronic Kidney Disease (CKD): EAI Endorsed Transactions on Pervasive Health and Technology, 2024. - 8 p. - DOI: 10.4108/eetpht.10.5512
- Da Poian V, Theiling B, Clough L, McKinney B, Major J, Chen J and Hörst S. Exploratory data analysis (EDA) machine learning approaches for ocean world analog-mass spectrometry, 2023. - 17 p. - DOI: 10.3389/fspas.2023.1134141
- Inigo Martinez, Elisabeth Vilesb, Igor G Olaizolaa. Data Science Methodologies: Current Challenges and Future Approaches, Jan 2022. - 22 p. - arXiv: https://arxiv.org/pdf/2106.07287.pdf
- Christoph Schocka, Jonas Dumlerb, Prof. Dr.-Ing. Frank Doeppera. Data Acquisition and Preparation – Enabling Data Analytics Projects within Production // 54th CIRP Conference on Manufacturing Systems: Procedia CIRP, 2021. - V. 104, pp. 636-640. - DOI: 10.1016/j.procir.2021.11.107
- F. Martínez-Plumed et al., "CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories," in IEEE Transactions on Knowledge and Data Engineering, 1 Aug 2021. - V. 33. - N 8. - pp.3048-3061, - DOI: 10.1109/TKDE.2019.2962680
- N. Bratchell. Cluster analysis: Chemometrics and Intelligent Laboratory Systems, 1989. - V. 6. - I. 2. - pp. 105-125. - DOI: 10.1016/0169-7439(87)80054-0
- Brady Lund, Jinxuan Ma. A review of cluster analysis techniques and their uses in library and information science research: k-means and k-medoids clustering. Performance Measurement and Metrics, 2021. - N 22. - pp.161-173. - DOI: 10.1108/PMM-05-2021-0026
- Mugavin, Marie. (2008). Multidimensional Scaling: A Brief Overview. Nursing research. 57. 64-8. - DOI: 10.1097/01.NNR.0000280659.88760.7c
- Chon Ho, Yu. (2010). Exploratory data analysis in the context of data mining and resampling. International Journal of Psychological Research, 3(1), 9-22. - URL: https://www.redalyc.org/pdf/2990/299023509014.pdf
- Miguel Ângelo Lellis Moreira, Claudio de Souza Rocha Junior, Diogo Ferreira de Lima Silva and others. Exploratory analysis and implementation of machine learning techniques for predictive assessment of fraud in banking systems: Procedia Computer Science, 2022. - V. 214. - pp. 117-124. - DOI: 10.1016/j.procs.2022.11.156
- Tova Milo, Amit Somech. SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, June 2020. - pp.2617-2622. - DOI: 10.1145/3318464.3383126
- Tabassum, Lubna. (2020). Fundamentals of artificial intelligence and deep learning techniques. 2020
- Hend A. Selmy, Hoda K. Mohamed, Walaa Medhat. Big data analytics deep learning techniques and applications: A survey: Information Systems, 2024. - V.120. - DOI: 10.1016/j.is.2023.102318
- Sarker, I.H. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN COMPUT. SCI. 2, 420 (2021). - DOI: 10.1007/s42979-021-00815-1
- Najafabadi, Maryam & Villanustre, Flavio & Khoshgoftaar, Taghi & Seliya, Naeem & Wald, Randall & Muharemagic, Edin. (2016). Deep Learning Techniques in Big Data Analytics. - DOI: 10.1007/978-3-319-44550-2_5
- Parishad Behnam Ghader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, Siva Reddy. LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders, 9 Apr 2024. - arXiv: https://arxiv.org/pdf/2404.05961v1.pdf
- Mohamed Nejjar, Luca Zacharias, Fabian Stiehle, Ingo Weber. LLMs for Science: Usage for Code Generation and Data Analysis, 7 Dec 2023. - arXiv: https://arxiv.org/pdf/2311.16733.pdf
- Tai, R. H., Bentley, L. R., Xia, X., Sitt, J. M., Fankhauser, S. C., Chicas-Mosier, A. M., & Monteith, B. G. (2024). An examination of the Use of Large Language Models to Aid Analysis of Textual Data. International Journal of Qualitative Methods, 23. - DOI: 10.1177/16094069241231168
- Jacqueline A Jansen, Artür Manukyan, Nour Al Khoury, Altuna Akalin. Leveraging large language models for data analysis automation, 2024. - 18 p. - DOI: 10.1101/2023.12.11.571140
- Lokazyuk V. Software development for texts with diagnostic information processing [Electronic resource] / V. Lokazyuk, V. Lyashkevych, O. Olar // Radio electronic and computer systems. - 2007. - № 6. - P. 123–129. - URL: http://nbuv.gov.ua/UJRN/recs_2007_6_25
- Fernandes, A.A.A., Koehler, M., Konstantinou, N. et al. Data Preparation: A Technological Perspective and Review. SN COMPUT. SCI. 4, 425 (2023). - DOI: 10.1007/s42979-023-01828-8
- Preprint of Atzmueller, M., Schmidt, A., Hollender, M. (2016) Data Preparation for Big Data Analytics: Methods & Experiences. In: Enterprise Big Data Engineering, Analytics, and Management, IGI Global.
- Costello, Tim & Blackshear, Lori. Prepare Your Data for Tableau: A Practical Guide to the Tableau Data Prep Tool, 2020. - DOI: 10.1007/978-1-4842-5497-4
- Sankar, Reshmi. (2018). EMPOWERING CHATBOTS WITH BUSINESS INTELLIGENCE BY BIG DATA INTEGRATION. International Journal of Advanced Research in Computer Science. 9. 627-631. - DOI: 10.26483/ijarcs.v9i1.5398
- Hamed Khosravi, Mohammad Reza Shafie, Morteza Hajiabadi, Ahmed Shoyeb Raihan, Imtiaz Ahmed. Chatbots and ChatGPT: A Bibliometric Analysis and Systematic Review of Publications in Web of Science and Scopus Databases: arXiv, 2023. - 30 p. - arXiv: https://arxiv.org/pdf/2304.05436.pdf
DOI: http://dx.doi.org/10.30970/eli.25.9
Refbacks
- There are currently no refbacks.