WHO AND ABOUT WHAT SPEAKS IN "CHEERFUL" AND "SAD" TEXTS: IN SEARCH OF DISCRIMINATION FEATURES IN TEXTS OF DIFFERENT EMOTIONAL TONALITIES : научное издание

Описание

Тип публикации: статья из журнала

Год издания: 2019

Идентификатор DOI: 10.15826/izv2.2019.21.4.078

Ключевые слова: sentiment analysis, emotional tonality, Internet texts, machine learning, lexical combinatorics, syntactical combinations, text class feature

Аннотация: This article focuses on the peculiarities of lexical and syntactical combinability of the Russian verb ("to speak") in Russian Internet texts of different emotion classes. The article aims to substantiate and validate the use of the established specific characteristics of the combinability of the lexeme as discriminant features serПоказать полностьюving to automatically detect eight emotional tonalities in Internet texts in Russian. The authors refer to a collection of texts found in the (The Overhead) public page in the vk.com social network. Using the eight classes classification of emotions proposed by Lovheim, the researchers correlate each of the texts in their selection whose total volume is over a million tokens with a particular emotion by referring to the corresponding hashtags and the emotion mapping of the texts carried out by 36 assessors, Russian native speakers of 19-45 years old. The statistical analysis including term-frequency-inverse document frequency measure (TF-IDF) and analysis of lexeme frequency in eight sub-corpora proves that the Russian verb does not have the same relevance in all sub-corpora, i.e. in four of them, it demonstrates a high relative frequency and a significant statistical specificity, but in the remaining four others it does not. Referring to the tools of corpus linguistics, the authors prove that to automatically attribute texts to a certain emotion class, it is essential to take into account the following peculiarities of lexical and syntactic combinability of the verb : a high percentage of subjective syntactic connections, the frequency of particular lexemes (e.g for the classes / ), and the total frequency of the lexemes belonging to one particular lexico-semantic group functioning as subject of the verbs; the frequency of separate collocations (e.g for the / class); the frequency of separate syntaxemes (e.g. " / lemma [ ]" for the / class); the frequency of competing syntaxemes in the specific lexemes and collocations in the position of its subject, the frequency of the syntaxemes "lemma [ ], ". "lemma [ ]: (direct speech)", marking the author's proneness to focus on the content of what is being said in the form of direct and reported speech. ' After having been applied as parameters to run the classifier, the discriminate features increased the accuracy of classification for some emotion classes of texts.

Ссылки на полный текст

Издание

Журнал: IZVESTIYA URALSKOGO FEDERALNOGO UNIVERSITETA-SERIYA 2-GUMANITARNYE NAUKI

Выпуск журнала: Vol. 21, Is. 4

Номера страниц: 219-234

ISSN журнала: 22272283

Место издания: EKATERINBURG

Издатель: URAL FEDERAL UNIV

Персоны

  • Kolmogorova Anastasia (Siberian Fed Univ, Romance Languages & Appl Linguist Dept, 82a Svobodny Ave, Krasnoyarsk 660041, Russia)
  • Kalinin Alexander A. (Siberian Fed Univ, Romance Languages & Appl Linguist Dept, 82a Svobodny Ave, Krasnoyarsk 660041, Russia)
  • Malikova Alina (Siberian Fed Univ, Romance Languages & Appl Linguist Dept, 82a Svobodny Ave, Krasnoyarsk 660041, Russia)

Вхождение в базы данных