ALGORITHM FOR SEARCHING AND ACQUISITION OF KNOWLEDGE BASED ON TECHNOLOGIES FOR PROCESSING AND ANALYZING TEXTS IN NATURAL LANGUAGE

Abstract

The article is devoted the topical scientific problem of increasing the efficiency of processing and analyzing text information when solving problems of searching and acquiring knowledge. The relevance of this task is related to the need to create effective means of processing the accumulated huge amount of poorly structured data containing important, sometimes hidden knowledge that is necessary for building effective control systems for complex objects of different nature. The algorithm of search and knowledge acquisition in processing and analyzing textual information proposed by the author is characterized by the use of low-level deterministic rules that allow for qualitative text simplification based on the exclusion of words invariant to meaning from textual information. The algorithm relies on domain elaboration that allows to create lists of domain-specific words, which allows for high quality text simplification. In this task, the input data are streams of textual information (profile descriptions) extracted from online recruiting platforms; the output information is represented by sentences formed in the form of a triple "subjectverb- object", reflecting the granules of knowledge obtained during text processing. The use of this order of units constituting a sentence is due to the fact that this order is the most widespread in the Russian language, although other variations of the order are possible in the texts themselves without losing the general meaning. The main idea of the algorithm is to split a large corpus of text into sentences, then filter the resulting sentences based on the keywords entered by the user. Subsequently, the sentences are further split into components and simplified depending on the type of received component (verbal, nominal). The field of marketing was used as an example in this work, and the keywords were "social media". The author has developed an algorithm for for knowledge search and acquisition based on natural language text processing and analysis technologies, and a software implementation of the proposed algorithm has been performed. A number of metrics were used as efficiency evaluation methods: the Flash- Kincaid index; the Coleman-Liau index; and the automatic readability index. The conducted computational experiments have confirmed the effectiveness of the proposed algorithm in comparison with analogues that use neural networks to solve similar problems

Скачивания

Published:

2024-11-10

Issue:

Section:

SECTION I. INFORMATION PROCESSING ALGORITHMS