METRIC-BASED COMPARISON OF FINE-TUNED LLAMA 2 AND MIXTRAL LARGE LANGUAGE MODELS FOR INSTRUCTION TASKS

Bohdan Pavlyshenko, I. Bulka

Abstract


The paper considers a comprehensive analysis and comparative study of two advanced Large Language Models (LLMs), namely LLaMA 2 and Mixtral, with a specific focus on their performance in executing instructional tasks. These models were fine-tuned using techniques such as LoRA and QLoRA, which were applied to extensive instruction datasets. The fine-tuning process was further enhanced by the implementation of Parameter-Efficient Fine-Tuning (PEFT) on NVIDIA A100 Tensor Core GPU instances, ensuring optimal performance. Both LLaMA 2 and Mixtral models were fine-tuned using the Hugging Face and PyTorch platforms, ensuring that similar parameters were maintained to facilitate a fair comparison. An inference was made using data not used in the initial training phase. This approach was adopted to test the models' ability to generalize and adapt to new, unseen data, thereby providing a more robust evaluation of their performance. An evaluation framework was established using the RAGAS library. The framework was designed to provide precise and reliable metrics, offering a comprehensive assessment of the models' performance. While the LLaMA 2 model demonstrates a faster rate of fine-tuning, it is susceptible to overfitting. On the other hand, Mixtrail, despite requiring more time for training, outperforms in evaluations, making it a more dependable tool for instructional tasks.

Keywords: LLMs, PEFT, Lora, Qlora, Mixtral, LLaMA, LLMs fine-tuning


Full Text:

PDF

References


  1. Lin Tianyang, Yuxin Wang, Xiangyang Liu, and Xipeng Qiu. "A survey of transformers." AI open 3 (2022): 111-132.
  2. Devlin Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
  3. Liu Xiao, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. "GPT understands, too." AI Open (2023).
  4. Vaswani Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017).
  5. Brown Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan et al. "Language models are few-shot learners." Advances in neural information processing systems 33 (2020): 1877-1901.
  6. OpenAI, R. "Gpt-4 technical report. arxiv 2303.08774." View in Article 2, no. 5 (2023).
  7. Bai Yuntao, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain et al. "Training a helpful and harmless assistant with reinforcement learning from human feedback." arXiv preprint arXiv:2204.05862 (2022).
  8. Zhang Zheng, Chen Zheng, Da Tang, Ke Sun, Yukun Ma, Yingtong Bu, Xun Zhou, and Liang Zhao. "Balancing specialized and general skills in llms: The impact of modern tuning and data strategy." arXiv preprint arXiv:2310.04945 (2023).
  9. Alt Christoph, Marc Hübner, and Leonhard Hennig. "Fine-tuning pre-trained transformer language models to distantly supervised relation extraction." arXiv preprint arXiv:1906.08646 (2019).
  10. Zhang Zheng, Chen Zheng, Da Tang, Ke Sun, Yukun Ma, Yingtong Bu, Xun Zhou, and Liang Zhao. "Balancing specialized and general skills in llms: The impact of modern tuning and data strategy." arXiv preprint arXiv:2310.04945 (2023).
  11. Wang Alex, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. "GLUE: A multi-task benchmark and analysis platform for natural language understanding." arXiv preprint arXiv:1804.07461 (2018).
  12. Zhang Zheng, Chen Zheng, Da Tang, Ke Sun, Yukun Ma, Yingtong Bu, Xun Zhou, and Liang Zhao. "Balancing specialized and general skills in llms: The impact of modern tuning and data strategy." arXiv preprint arXiv:2310.04945 (2023).
  13. Yao Yifan, Jinhao Duan, Kaidi Xu, Yuanfang Cai, Zhibo Sun, and Yue Zhang. "A survey on large language model (llm) security and privacy: The good, the bad, and the ugly." High-Confidence Computing (2024): 100211.
  14. Lewis Patrick, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler et al. "Retrieval-augmented generation for knowledge-intensive nlp tasks." Advances in Neural Information Processing Systems. 33. (2020): 9459-9474.
  15. Pavlyshenko Bohdan M. "Analysis of disinformation and fake news detection using fine-tuned large language model." arXiv preprint arXiv:2309.04704 (2023).
  16. Pavlyshenko Bohdan M. "Financial News Analytics Using Fine-Tuned Llama 2 GPT Model." arXiv preprint arXiv:2308.13032 (2023).
  17. Touvron Hugo, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov et al. "Llama 2: Open foundation and fine-tuned chat models." arXiv preprint arXiv:2307.09288 (2023).
  18. Jiang Albert Q., Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot et al. "Mixtral of experts." arXiv preprint arXiv:2401.04088 (2024).
  19. Qin Haotong, Xudong Ma, Xingyu Zheng, Xiaoyang Li, Yang Zhang, Shouda Liu, Jie Luo, Xianglong Liu, and Michele Magno. "Accurate LoRA-Finetuning Quantization of LLMs via Information Retention." arXiv preprint arXiv:2402.05445 (2024).
  20. Dettmers Tim, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. "Qlora: Efficient finetuning of quantized llms." Advances in Neural Information Processing Systems. 36. (2024).
  21. Choquette Jack, Wishwesh Gandhi, Olivier Giroux, Nick Stam, and Ronny Krashinsky. "Nvidia a100 tensor core gpu: Performance and innovation." IEEE Micro 41, no. 2 (2021): 29-35.
  22. Ding Ning, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu et al. "Parameter-efficient fine-tuning of large-scale pre-trained language models." Nature Machine Intelligence 5, no. 3 (2023): 220-235.
  23. Fu Zihao, Haoran Yang, Anthony Man-Cho So, Wai Lam, Lidong Bing, and Nigel Collier. "On the effectiveness of parameter-efficient fine-tuning." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 11, pp. 12799-12807. 2023.
  24. Desmond Michael, Zahra Ashktorab, Qian Pan, Casey Dugan, and James M. Johnson. "EvaluLLM: LLM assisted evaluation of generative outputs." In Companion Proceedings of the 29th International Conference on Intelligent User Interfaces, pp. 30-32. 2024.
  25. Dettmers Tim, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. "Qlora: Efficient finetuning of quantized llms." Advances in Neural Information Processing Systems 36 (2024).
  26. Ding Ning, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu et al. "Parameter-efficient fine-tuning of large-scale pre-trained language models." Nature Machine Intelligence 5, no. 3 (2023): 220-235.
  27. Fu Zihao, Haoran Yang, Anthony Man-Cho So, Wai Lam, Lidong Bing, and Nigel Collier. "On the effectiveness of parameter-efficient fine-tuning." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 11, pp. 12799-12807. 2023.
  28. Zuccon Guido, Bevan Koopman. "Dr ChatGPT, tell me what I want to hear: How prompt knowledge impacts health answer correctness." arXiv preprint arXiv:2302.13793 (2023).
  29. Xia Peipei, Li Zhang, and Fanzhang Li. "Learning similarity with cosine similarity ensemble." Information sciences 307 (2015): 39-52.
  30. Srivastava Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov. "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15, no. 1 (2014): 1929-1958.




DOI: http://dx.doi.org/10.30970/eli.26.2

Refbacks

  • There are currently no refbacks.