Comparative analysis of the efficiency of classical and neural network approaches for text vectorization in solving classification problems : доклад, тезисы доклада

Описание

Тип публикации: доклад, тезисы доклада, статья из сборника материалов конференций

Конференция: Proceedings II International Scientific Conference on Advances in Science, Engineering and Digital Education (ASEDU-II-2021); Krasnoyarsk; Krasnoyarsk

Год издания: 2022

Идентификатор DOI: 10.1063/5.0106058

Аннотация: Data analysis models tend to work with numeric features, so you must convert the string representation to numeric before applying existing models to text data. This representation is called vector representation or vector model, and the transformation process is called vectorization. In connection with the appearance in recent yearПоказать полностьюs of a variety of developed text vectorization methods based on neural network methods for forming words embeddings, there is a need for a comparative analysis of vectorization approaches in order to determine promising development directions. The paper contains the results of an experimental study of various approaches to vectorization of text, as well as the results of the operation of classification algorithms with different approaches of vectorization. It is shown that the use of pre-trained text vectorization models in a number of cases provides the maximum classification accuracy, as well as the fact that, as a machine learning method among the tested, logistic regression is best suited to the task.

Ссылки на полный текст

Издание

Журнал: Proceedings II International Scientific Conference on Advances in Science, Engineering and Digital Education (ASEDU-II-2021)

Выпуск журнала: 2647 А

Номера страниц: 50031

Место издания: Krasnoyarsk

Издатель: AIP PUBLISHING

Персоны

  • Sherstnev P. A. (Reshetnev Siberian State University of Science and Technology,)
  • Polyakova A. S. (Reshetnev Siberian State University of Science and Technology,)
  • Lipinskiy L. V. (Reshetnev Siberian State University of Science and Technology,)

Вхождение в базы данных