USING MACHINE LEARNING AND SEMANTIC FEATURES IN INTELLECTUAL ANALYSIS OF TEXT DATA
Abstract
In the paper, the study of using supervised machine learning and the semantic features was carried out. As semantic features, semantic and thematic fields, the components of singular value decomposition and the components of latent Dirichlet allocation were considered. As semantic fields, groups of lexemes united by some specified concept were considered. Groups of lexemes with text frequencies twice as high in specified classes of text documents as in a mutual set of text documents were considered as thematic fields. Using thematic fields features, text documents classes can be differentiated accurately. The classification analysis was conducted using Random Forest algorithm and deep learning algorithms for neural networks. Neural networks with fully connected layers and semantic quantitative features were analyzed. Neural networks with embedded layers for text representation and with bidirectional LSTM layer were considered. LSTM layer makes it possible to take into account the order and combinations of words. The approach with the use of neural networks which consists of the recurrent neural subnetwork for text data processing and the subnetwork for numerical semantic features is considered. Precision, recall and f1-scores were used for classification scoring. The cases with the combination of semantic and thematic field features, singular value decomposition components for TF-IDF matrix and latent Derichlet allocation components were considered. The numeric regression using text data as input features for the case of product analytics using product text description was considered. For this regression analysis, combined neural network with LSTM layer for text analytics and fully connected layer for numerical text semantic features were considered. The results show that the patterns in the product text descriptions can be found by this kind of neural network, the accuracy for price prediction improves on the training iteration of such combined neural network. The use of the wide class of semantic features in intellectual analysis of text data diversifies analytic approaches and increases semantic feature space in analytical problems when the prediction potential of the features can change with time.
Key words: text analytics, text semantic features, text classification, neural networks.
Full Text:
PDF (Українська)DOI: http://dx.doi.org/10.30970/eli.13.1
Refbacks
- There are currently no refbacks.