Тип публикации: доклад, тезисы доклада, статья из сборника материалов конференций
Конференция: Proceedings II International Scientific Conference on Advances in Science, Engineering and Digital Education (ASEDU-II-2021); Krasnoyarsk; Krasnoyarsk
Год издания: 2022
Идентификатор DOI: 10.1063/5.0106058
Аннотация: Data analysis models tend to work with numeric features, so you must convert the string representation to numeric before applying existing models to text data. This representation is called vector representation or vector model, and the transformation process is called vectorization. In connection with the appearance in recent yearПоказать полностьюs of a variety of developed text vectorization methods based on neural network methods for forming words embeddings, there is a need for a comparative analysis of vectorization approaches in order to determine promising development directions. The paper contains the results of an experimental study of various approaches to vectorization of text, as well as the results of the operation of classification algorithms with different approaches of vectorization. It is shown that the use of pre-trained text vectorization models in a number of cases provides the maximum classification accuracy, as well as the fact that, as a machine learning method among the tested, logistic regression is best suited to the task.
Журнал: Proceedings II International Scientific Conference on Advances in Science, Engineering and Digital Education (ASEDU-II-2021)
Выпуск журнала: 2647 А
Номера страниц: 50031
Место издания: Krasnoyarsk
Издатель: AIP PUBLISHING