INFLUENCE OF LEARNING RATE PARAMETER IN DIFFERENT TRANSFORMER MODELS IN TEXT CLASSIFICATION TASK
Abstract
This article is focused on the influence of the learning rate parameter on the training results of pre-trained transformer models: BERT, DistilBERT, ALBERT, and XLM-RoBERTa. As data for models training and testing dataset from HuggingFace portal is used. This dataset contains labeled data for both testing and training purposes. Moreover, it contains unlabeled data for unsupervised models and algorithms. Instead of direct training and testing, Trainer and TrainingArgument classes from the HuggingFace portal were used. For batch formation, DataColator class was utilized. Different metrics of model training efficiency were considered: learning time, the output of validation and training loss functions. Work result allows comparing the efficiency of every observed model in binary text classification tasks standalone or in assembly with other models.
Keywords: transformers, binary text classification, BERT, ALBERT, DistilBERT, XLM-RoBERTa.
Full Text:
PDFReferences
- IDC. The digitization of the World from Edge to Core. 2018.
- Eberendu, Adanma Cecilia. "Unstructured Data: an overview of the data of Big Data." International Journal of Computer Trends and Technology 38, no. 1 (2016): 46-50.
- Ott, R. Lyman, and Micheal T. Longnecker. An introduction to statistical methods and data analysis. Cengage Learning, 2015.
- Anderson, James A. An introduction to neural networks. MIT press, 1995.
- Pavlyshenko, Bohdan. "Classification analysis of authorship fiction texts in the space of semantic fields." Journal of Quantitative Linguistics 20, no. 3 (2013): 218-226.
- Pavlyshenko, Bohdan. "The Model of Semantic Concepts Lattice For Data Mining Of Microblogs." arXiv preprint arXiv:1210.7917 (2012).
- Pavlyshenko, Bohdan. "Clustering of authors’ texts of english fiction in the vector space of semantic fields." Cybernetics and Information Technologies 14, no. 3 (2014): 25-36.
- Pavlyshenko, Bohdan. "The Distribution of Semantic Fields in Author’s Texts." Cybernetics and Information Technologies 16, no. 3 (2016): 195-204.
- Pavlyshenko, Bohdan M. "Methods of Informational Trends Analytics and Fake News Detection on Twitter." arXiv preprint arXiv:2204.04891 (2022).
- Pavlyshenko, Bohdan. "Genetic optimization of keyword subsets in the classification analysis of authorship of texts." Journal of Quantitative Linguistics 21, no. 4 (2014): 341-349.
- AminiMotlagh, Masoud, HadiShahriar Shahhoseini, and Nina Fatehi. "A reliable sentiment analysis for classification of tweets in social networks." Social Network Analysis and Mining 13, no. 1 (2022): 7.
- Wang, Shaonan, Yunhao Zhang, Weiting Shi, Guangyao Zhang, Jiajun Zhang, Nan Lin, and Chengqing Zong. "A large dataset of semantic ratings and its computational extension." Scientific Data 10, no. 1 (2023): 106.
- Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).
- Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017).
- Khurana, Diksha, Aditya Koli, Kiran Khatter, and Sukhdev Singh. "Natural language processing: State of the art, current trends and challenges." Multimedia tools and applications 82, no. 3 (2023): 3713-3744.
- Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
- Liu, Yinhan, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. "Roberta: A robustly optimized bert pretraining approach." arXiv preprint arXiv:1907.11692 (2019).
- Sanh, Victor, Lysandre Debut, Julien Chaumond, and Thomas Wolf. "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter." arXiv preprint arXiv:1910.01108 (2019).
- Lan, Zhenzhong, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. "Albert: A lite bert for self-supervised learning of language representations." arXiv preprint arXiv:1909.11942 (2019).
- Tay, Yi, Mostafa Dehghani, Jai Gupta, Dara Bahri, Vamsi Aribandi, Zhen Qin, and Donald Metzler. "Are pre-trained convolutions better than pre-trained transformers?." arXiv preprint arXiv:2105.03322 (2021).
- Maas, Andrew, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. "Learning word vectors for sentiment analysis." In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pp. 142-150. 2011.
- HuggingFace. URL: https://huggingface.co/ (accessed on April 19, 2023).
DOI: http://dx.doi.org/10.30970/eli.21.4
Refbacks
- There are currently no refbacks.