No. 8 (2018)

					View No. 8 (2018)
Published: 2019-04-04

SECTION I. PRINCIPLES OF CONSTRUCTION OF HIGH-PERFORMANCE COMPUTING SYSTEMS

  • HETEROGENEOUS SUPERCOMPUTING SYSTEMS BASED ON OPENCL SOFTWARE PLATFORM

    A.P. Antonov, V.S. Zaborovskij, I.O. Kiselev, К.А. Antonov
    6-18
    Abstract

    The increase of the demands, that society specifies to the computing systems, does not corre-sponds to the increase in their performance. This is due to the approach of the VLSI integration to its physical limit, as well as with the use of general-purpose architectures to perform highly spe-cialized tasks. The first problem is fundamental and can not be figured out in short term perspective until mass-market quantum or optical computers appear. The second one could be solved today. High performance computing world tends to move from clusters, based on general purpose processors to heterogeneous structures, such as: GPUs and ASIC. However the most perspective approach is to use hardware-reconfigurable computers based on FPGAs. They have both the ad-vantages of ASIC, that is low power consumption and high efficiency on a specific task, and the flexibility of GPU, i.e. configuration could be changed by software. The current paper shows the advantages of using reconfigurable computers instead of traditional approaches. Also it describes a unique reconfigurable computing board based on an array of 4 Xilinx FPGA. The board-support package, which allows creating configuration using OpenCL language, has been made for this board. OpenCL is a cross platform standard for high-performance parallel computing. We are integrating the board to a reconfigurable supercomputer, and developing an intellectual profiler, that is going to suppose to determine which computational unit suits better for the current task.

  • AUTOMATION OF HIGH PERFORMANCE COMPUTER SYSTEMS DESIGN PROCESS

    V.S. Gorbunov, A.I. Tupitsyn
    18-29
    Abstract

    Currently, effective solution of the task of transfer to a new level of productivity and effec-tiveness of supercomputers is impossible without digital transformation of the industry of creation and application of high performance computer systems. This task is complicated by the fact that for supercomputers the duration of implementation of the project of creation of a new device is important. The established practice suggests that it is necessary to create a supercomputer in one, maximum two years. Solving of such problems when required is impossible without application of digital models of the devices – high performance computer systems (supercomputers) and data centers on their base. In this paper, authors suggest a conception of automation of supercomputers design process. The automation is based on an application of interdependent set of digital models on supercomputer life cycle stages: scientific research and development (conceptual design), en-gineering design, manufacturing and use of the device. The authors suggest basing the creation of digital models of computation system on digital models of components of computing technics. Dur-ing development of this «sub-models», it is necessary to apply contemporary technologies of artifi-cial intelligence. These sub-models must be self-training systems, which use information from a set of data sources: manufacturers of the ECB, results of monitoring of the information space of the Internet, and information from experts. The authors suggest using ontologies to ensure coordina-tion of digital models on different stages of the device lifecycle. The authors comprehensively de-scribe application of digital models during generation and selection of supercomputer project solution and provide a description of a practical realization of the proposed automation of the design process in an Automatized System of Supercomputers Design (ASSD), developed at FSUE «RDI «Kvant». As far as designing supercomputer architecture, this system allows to produce different choices of supercomputer architecture (project solutions) taking into account the ontolo-gy of the future device and available information of unified construction components and original products. For the purpose of cooperation between participants of the supercomputer design pro-cess, in ASSD an automatized import of descriptions of sub processes of a technological process of supercomputers design process in a System of Project Control and Management (SPCM) is real-ized. As a result of this import, in SPCM necessary tasks for each participant of supercomputer design process are created. These tasks are formed with the enumeration of a set of activities, which these participants have to perform. The application of the proposed approach to creation of problem-oriented high performance computation devices will permit to offer the challenge to build in Russia problem-oriented supercomputers and devices with the application of native component and microelectronics base in the required timescale.

  • THE COMMUTATION SHEME OF THE PARALLEL COMPANION TRANSFORMATIONS FOR SPECIALIZED COMPUTING DEVICE

    Е.А. Titenko, A.V. Kripachev, A.L. Marukhlenko
    29-38
    Abstract

    The article shows the reduction of time spent on generating combinations of elements of the set. The elements of the set are formed from samples (left parts) of the production rules. The main task is to build time-efficient schemes (algorithms) for parallel generation of combinations of ar-ray elements. With regard to production systems, such schemes are necessary for the activation of a subset of products applicable to character data in the current step. The well-known algorithm of the parallel bubble is taken as the basis and developed. The switching circuit "parallel bubble" consists of two alternating variants of switching elements in pairs. These commutations are based on local union into pairs of array elements with adjacent indices. Such a local combination of elements into pairs leads to "small" displacements of elements along the length of the array and the regular nature of the generation of pairs. In each pair, the operation of comparison-exchange of operands is performed. For production systems, the comparison operation is reduced to the search for sample intersections and the formation of a list of conflicting words. The reduction in the generation time of combinations is based on the construction of switching options with distrib-uted combining of elements in pairs with a step equal to 4. The developed switching scheme con-tains on odd switching steps with a local combination of elements in pairs. In even-numbered steps, a switching accelerator is performed with a distributed combination of elements in pairs. The simulation of the developed switching scheme performance has been carried out on typical tasks of sorting and complete enumeration of pairs of elements. The time costs compared with the scheme "parallel bubble" are reduced by 15–18 %. A linear dependence of the sorting time with a slope angle less than 1 has been determined. This allows the use of a switching circuit for large-scale production systems. Local and distributed communications in the switching scheme preserve the property of regularity. This feature determines the hardware implementation of the circuit in the form of a parallel switch with natural scaling. This scheme can be used in a specialized pro-duction device for decomposing a production system into independent subsets of products.

  • THE METHOD OF ARCHITECTURALLY INDEPENDENT HIGH-LEVEL SYNTHESIS OF VLSI

    O.V. Nepomnyaschy, A.I. Legalov, I.N. Ryzhenko
    38-47
    Abstract

    The problem of high-level design of complex functional circuits and systems intended for implementation in the form of VLSI is considered. The basic shortcomings of existing approaches are revealed and a conceptually new method of project synthesis is proposed. The method is based on the functional-streaming paradigm of parallel computing, it allows for implementation of an architecture-independent development of algorithms for VLSI operation. A computation model is proposed that uses a number of intermediate structures, namely, control, information, and HDL graphs. The requirements for the language of functional-streaming programming and, taking into account the specifics of the tasks to be solved, the language is selected and modified. The descrip-tion of the key points in the semantics of the language, the principles of parallelism transformation and the formation of intermediate representations in the transition to the target platform are given. The route and algorithm of high-level synthesis has been developed. The basic requirements for the creation of architecture-independent tools are identified, software tools are implemented, and a number of test projects are executed.

  • IP-CORE FOR A FUNCTIONAL-ORIENTED PROCESSOR WITH VLIW-RISC ARCHITECTURE

    N.A. Lookin
    48-58
    Abstract

    The analysis of algorithms of real-time control systems shows that in order to ensure the re-quired performance of embedded computer systems, it is necessary to ensure the minimum time for processing the specific algorithmic procedures. The VLSI implementation of special-purpose ar-chitectures of embedded processors allows, due to the hardware implementation of such proce-dures, to significantly minimize the processing time. The main goals of the synthesis of IP-core of the functional-oriented processor in this case are the maximum possible performance of pro-cessing of the basic procedure and a minimum of overhead costs for the programs functioning. The IP-core, based on a combination of RISC architecture and VLIW-class instructions, makes it possible to achieve these goals when computing the scalar product of vectors, adopted as the basic procedure. A high-performance 64-bit processor core has been developed, which performs any operation of the instruction set in no more than one clock cycle, ensuring minimal computing time for the scalar product. This leads to the fastest possible implementation of navigation algorithms for control systems for rocket and space technology and aircraft. The article discusses the results of the development of technological software designed for the design and verification of applica-tion programs of a function-oriented processor. The structure of this software includes a special-ized assembler language DAPLANG, created on the basis of the ML-1 macro-generator, as well as a special library for interacting FOP with the Lua scripting language. The results of experimental studies of FOP with for verifying the architecture and system of commands are given.

SECTION. II. DISTRIBUTED AND CLOUD COMPUTING

  • MULTI-AGENT ALGORITHM FOR CREATING A RESIDUAL PROBLEM-SOLVING SCHEME IN DISTRIBUTED APPLIED SOFTWARE PACKAGES

    A.G. Feoktistov, R.O. Kostromin, I.A. Sidorov, S.A. Gorsky
    59-69
    Abstract

    Nowadays, basic software tools that implement technologies for organizing computations in high-performance computing systems provide a potential basis for the mass creation and use of parallel and distributed applications. Tools for creating applied software packages and workflow support systems are being actively developed and applied in practice. However, an analysis of their practical application allows us to conclude that it is necessary to increase the fault-tolerance of problem-solving processes in distributed applied software packages for problems that includesets of interrelated subproblems. In particular, this problem becomes urgent when we solve problems in a heterogeneous distributed computing environment. Clusters, including hybrid clusters with het-erogeneous nodes, are the main components of such an environment. High-performance servers, storage systems, personal computers, and other computing elements complement the infrastructure of the environment. The paper presents an adaptive multi-agent algorithm, which is intended for the redistribution of jobs on the resources of such an environment. The algorithm is used when restarting the problem-solving process in distributed applied software packages after the failure of software and hardware. In contrast to the well-known algorithms for maintaining fault-tolerance of distributed computing that are used in workflow management systems, the work of this algorithm is based on the use of program specialization methods for creating and executing a residual problem-solving scheme. It also actively applies meta-monitoring of computational resources. Comparative analysis of the experimental results on the semi-natural modeling the support of the fault-tolerance of the scheme-executing process for solving problems of distributed applied software packages by various meta-schedulers demonstrated the advantage of the proposed approach to multi-agent management in the heterogeneous distributed computing environment.

  • THE METHOD OF RESOLUTION OF THE CONFLICT IN THE MULTIAGENT CONTROL SYSTEM OF THE AUTONOMOUS UNDERWATER VEHICLE WITH THE USE OF DISTRIBUTED CALCULATIONS

    L.A. Martynova
    69-83
    Abstract

    The purpose of the study is to increase the efficiency of the unattended underwater vehicle (AUV)functioning by resolving the conflict in its multi-agent control system, which is related to power consumption by the AUV subsystems. In the transition from exclusive use of the battery to additional use of the battery, arose the need to resolve the contradiction between the provision of energy resources by diverse sources and its consumption. The complexity of solving the problem consisted in the unpredictability of the use of various speed regimes that affect the AUV’s energy consumption. The proposed method for resolving the conflict is based on the decomposition of energy resource consumers and forecasting the possibility of accomplishing the task assigned to the AUV related to overcoming a given distance within a given time. For this purpose, we devel-oped a method based on the following algorithms of: forecasting the sufficiency of energy re-sources to overcome a given distance; determining the permissible current consumption of energy resources and the corresponding speed regime; estimating the time needed to overcome the re-maining distance. The listed algorithms are characterized by a set of parameters that have been optimized depending on the prevailing conditions during the execution of the task by the device. As a criterion of optimality, the probability of overcoming a given distance within a given time is used. When optimizing the parameters the following items have been taken into account: current battery level; current level of the electrochemical generator’s reserve; time during which the de-vice has already overcome part of the specified distance. According to these data, the following have been determined successively: remaining distance; time taken to overcome the remaining distance; reserve of energy resource expended on overcoming the remaining distance. When driv-ing in high-speed mode, we determined the following: remaining time of consumption from the battery; energy resource consumed during this time; battery energy reserve remaining after the high-speed mode. When the vehicle is moving in its normal mode, the following parameters are calculated: consumption of energy resources from the electrochemical generator taking into ac-count the simultaneous charge of the battery; battery charge time; time to overcome the remaining distance; speed at which you need to move to overcome the remaining distance; estimation of sufficiency of an energy resource on overcoming of the remained distance; specific power consump-tion corresponding to the required speed; moment of transition to the economy run mode. Based on the results of the proposed algorithm, the current allowable energy resource consumption is determined, which corresponds to: current source (battery or electrochemical generator); current speed mode (full-time or high-speed); predicted probability of overcoming a given distance. The proposed method is tested using a specially developed mathematical model for the functioning of the apparatus and its energy supply system with two heterogeneous energy sources. The software implementation of the mathematical model allows to carry out numerical experiments using the proposed method under various hydrological conditions, the results of which show the undeniable advantage of the proposed method, which significantly increases the probability of the apparatus performing the task assigned to it.

  • ONTOLOGY BASED WORKLOAD ALLOCATION PROBLEM SOLVING IN FOG COMPUTING ENVIRONMENT

    A.B. Klimenko, I.B. Safronenkova
    83-94
    Abstract

    Fog computing concept is quite new but applied almost everywhere. This is due to the fact of intensive processed data capacity growth, so the cloud computing architectures, which were used successfully before, become insufficient in the conditions of Internet of Things (IoT). A workload allocation problem in heterogeneous computing environment is not new and has been solved many times. However, the known problem models neglect some special aspects of fog computing such as: inequality of computation nodes; mandatory participation of cloud layer in the computingprocess. The current paper focuses on the problem formalizing the workload allocation problem in view of fog computing special aspects by using the device “offload” strategy. In this case the task subgraph reallocation on some computing device subsets of fog layer takes place. A constraint which is peculiar to the fog computing is added to the workload allocation problem in heterogene-ous computing environment. This is a multicriteria optimization problem with multiple constraints, which are determined by the system peculiarities, so the optimization problem is NP- hard. It puts a question of quality decisions getting in the limited time conditions. In this paper an approach based on the optimization problem search space reduction through the candidate computing de-vice set selecting is proposed. An ontological approach is used for this purpose: ontology structure that classifies the reallocated subgaph respectively to available resources has been developed. The rules, which are based on developed ontology, apply to candidate nodes choosing for task subgraph allocation. This allows to efficiently reduce the solution search space.

  • ON ORGANIZATION OF DATA COLLECTION AND PROCESSING IN THE SYSTEM OF FORECASTING OF DANGEROUS PHENOMENA FOR THE COASTAL ZONE TO APPLY THE TECHNOLOGIES OF THE DIGITAL ECONOMY

    E.V. Melnik, M.V. Orda-Zhigulina, А.А. Rodina, D.V. Orda-Zhigulina, D.Y. Ivanov
    94-103
    Abstract

    Recently “fog computing” and technology of the industrial Internet of Things are actively developing. These technologies allow linking data services, distributing the load on available re-sources, processing large amounts of data in the networks, which is very important for monitoring in real-time and for medical databases. The technologies link together different types of smart sensors meteorological data, hydrological and biological monitoring data, physiological parame-ters of people in the coastal zone, mobile devices of the coastal population and important messag-es of users in social networks. This paper addresses the organization of data collection and pro-cessing in the system of monitoring and forecasting hazardous phenomena and ensuring the safety of the population and coastal infrastructure based on such digital economy technologies as foggy computing, industrial Internet of things and distributed registry. It is shown that in the framework of the implementation of the “combined” method of organizing the system previously proposed by the authors, it is possible to compare primary meteorological data, hydrological and biological monitoring data, human physiological parameters and text messages, photos and video from so-cial networks using the existing information infrastructure. A literature review and patent search were conducted, as a result of which the main types of data and sensors were identified, which are used in systems for monitoring and forecasting hazardous processes and ensuring public safety. It was suggested the method of monitoring of physiological parameters of people in the coastal zone for the monitoring system of forecasting hazardous processes. The method is proposed for moni-toring the physiological parameters of people living in the coastal zone.

SECTION III. MATHEMATICAL AND SOFTWARE

  • ON THE EFFICIENCY OF THE NOISY TEXT CORRECTION SOFTWARE DEPENDING ON THE DISTORTION TYPE

    D.A. Birin, V.A. Peresypkin, S.Y. Melnikov, I.A. Pisarev, N.N. Copkalo
    104-114
    Abstract

    The capabilities of four automatic text correction software (Yandex.Speller, Afterscan, Bing Spell Check, Texterra) for noisy texts correction are analyzed. The distortions of texts that occur while typing text on the keyboard and recognition systems working are described. Experimental data on the accuracy of the correction of distorted texts obtained both by typing and as the output of real OCR systems processing low-quality images and ASR systems in a noisy environment are presented. To simulate the distortions caused by the recognition systems, a two-stage model of random text distortions is proposed. At the first stage (word distortions with a given probability) the distorted word in the text is replaced with a random dictionary word with Levenshtein distance 1 or 2. The replacement word is chosen according to the uniform distribution. At the second stage (character distortions with a given probability) the distorted character is removed with a probability of 1/3, or a random character is inserted before it with a probability of 1/3, or it is replaced with a random alphabet character with a probability of 1/3. The replacement character is chosen according to the uniform distribution. The distorted texts obtained in this way are corrected using the Yandex.Speller and Bing Spell Check software and the percentage of true words in the correct-ed text is calculated. The data are averaged over a set of texts. The results of experiments with an estimation of the correction accuracy in the following parameter range are given: the probabilities of word distortion vary from 0 to 0.9 and the probabilities of symbol distortion vary from 0 to 0.5. The results show that Yandex.Speller, Bing Spell Check and Texterra provide good quality of the correction of distortions that occur while typing. This software are ineffective for correcting dis-tortions caused by the recognition systems.

  • APPLICATION OF EXACT AND LIMIT APPROXIMATIONS OF STATISTICS PROBABILITY DISTRIBUTIONS FOR THE PROBLEM OF TEXT PROCESSING

    А.К. Melnikov
    114-135
    Abstract

    In the paper we consider application of limit and exact approximations of statistics proba-bility distributions for the problem of selection of texts with specific statistical properties. For selection of texts with equiprobable distribution of their symbols we use the statistical fitting crite-rion. Here, as a standard distribution of the test statistic we use its various approximations. As extreme approximations we use limit distributions, and as exact approximations we use Δexact distributions. The difference between Δexact distributions and exact distributions does not exceed the specified Δ. We present the calculation results of Δexact distributions, show their variations from the values of limit distributions for different statistics. We consider the notion of processing efficiency for selection of equiprobable texts, which shows the part of wrong selected texts. We com-pare the processing efficiency for exact and limit approximations of standard distributions of test statistics. We have proved that the processing efficiency does not decreasing, but in many cases it is increasing, if the exact approximation is used instead of the extreme one. To compare the statistical criteria which are based on the same test statistic and different standard distributions, we introduce a concept of the distribution relative efficiency which shows the fold increase of the number of wrong selected texts for the criterion of one or another distribution used as a standard distribution. We show the functional connection between the concepts “processing efficiency” and “relative efficiency” of distributions. Owing to availability of high-performance computing facilities, which can be used for calculation of Δexact distributions for such parameters as the length and capacity of the text alphabet, we have proved the statement about relative efficiency of distributions. Owing to the statement it is possible to select a standard distribution of the criterion (with the highest pro-cessing efficiency) from the set of distributions of the test statistic. In addition we give the exam-ples of the values of relative efficiency for exact and extreme approximations.

  • PROTECTION OF COPYRIGHT TO THE IMAGE WITH THE USE OF METHODS OF MORPHOLOGICAL PROCESSING

    А.М. Abasova, L.K. Babenko
    135-145
    Abstract

    This article raises the issue of the introduction of digital watermarks in the image area, which are less likely modified and, therefore, are suitable for ensuring effective protection of cop-yright, taking into account the type of destructive effects that are characteristic of their violation. The container is a color image, as a digital watermark - a text containing a copyright protection symbol. For the implementation, foreground blocks are chosen, because according to the survey they represent the value of the image, which is especially characteristic for commercial photo-graphs. The search for data blocks for implementation is carried out using marking using methods of mathematical morphology. Also in the article, the ability of a structural element to fulfill the role of key information has been demonstrated. It is proposed to use the geometric center of each found block of the foreground for the introduction of the digital watermark as a reference point when it is filled. The results of the evaluation of the ability to correctly extract an embedded digital watermark according to the proposed solution using the developed program, the results of the analysis of the effectiveness of the developed program and a comparison with existing software products used for copyright protection are presented

  • STUDY OF SURFACE ATOMIC DIFFUSION ON PRE-PATTERNED SI SUBSTRATES BY MOLECULAR DYNAMICS SIMULATION: MODELING WITH THE USE OF HIGHLY EFFICIENT ALGORITHMS

    P.L. Novikov, K.V. Pavskiy, A.V. Dvurechenskiy
    145-153
    Abstract

    There is a growing interest of researchers to the creation of space-arranged arrays of quan-tum dots (QDs). These structures are promising as an element basis for thermally stable solid state lasers, field effect transistors with enhanced electron mobility, photosensitive matrices etc. Such structures can be obtained by heteroepitaxial growth on a pre-patterned template. Under the proper conditions of heteroepitaxy nanoislands may nucleate in the pits or grooves, forming the space-arranged array of QDs. In the area of basic research the microscopic mechanism of atom diffusion on a non-planar crystal surface is not studied enough. The target of this paper is the elucidation of atomic diffusion mechanism on pre-patterned Si substrates. In order to achieve the purpose virtual Si(001)-1×2 structure with a system of parallel grooves has been formed. The groove width and inter-distance were chosen the same, which corresponds to geometry of experi-mental pre-patterned substrates, prepared by nanoimprint lithography. An algorithm of calcula-tion of the pre-patterned substrate energy surface has been developed on the basis of molecular dynamics method. The energy surface was mapped out for Si (001) substrate in the region of groove. The positions of minima and saddle points at the energy surface have been found, surface diffusion activation energy was calculated for Ge atoms, and the typical Ge atoms migration paths on the groove walls have been determined. The analysis of microscopic mechanism of atomic dif-fusion on a pre-patterned substrate has been carried out. Possible reasons, preventing atom mi-gration inside grooves and nucleation of 3D nanoislands there, are discussed. MD simulations are related to big volume data processing and require significant machine time spent. In order to ac-celerate the calculations a parallel algorithm for neighbors seeking in a large system of atoms has been developed. The time of calculations has been obtained as depending upon the number of nuclei within a single node. The effect of acceleration is shown to be linear against the number of cores at least from 1 to 8.

  • ANALYSIS OF THE KINETIC PARAMETERS CALCULATION PARALLELING EFFICIENCY IN THE COMPLEX CHEMICAL REACTION

    K.F. Koledina, М.К. Vovdenko, I.M. Gubaydullin, S.N. Koledin
    153-163
    Abstract

    The aim of this study is to calculate the kinetic parameters of the reaction isopropylbenzene ox-idation on several schemes chemical transformations with the use of the parallelizing computational problem analysis algorithms and the efficiency analysis. The main stages of parallelization for solv-ing the inverse kinetic problem are the following: the first group includes all mechanisms proposed for chemical reaction; for each mechanism, all reaction experiments are considered; for each exper-iment, the parametric plane is separated to search for kinetic parameters. The solution of an inverse kinetic problem relates to optimization problems for which there are models for paralleling a compu-tational process: the island model, the cell model, the global Master-Worker model. The object of the study is the reaction of isopropylbenzene oxidation with atmospheric oxygen. The reaction is one of the stages in the technological process of producing phenol and acetone by the cumene method. This method is most common industrial method in the world for the synthesis of these substances. The reaction refers to a radical chain process. The basic elementary reactions for the stages of chain initiation, chain development, chain attenuation are considered. To solve direct and inverse kinetic problems, in order to determine the kinetic parameters for elementary stages, mathematical methods such as the 4th-order Runge, the variable order method in the MATLAB software environment are used. Kinetic models have been developed for the three reaction schemes of isopropylbenzene oxida-tion. A comparison is made. When developing kinetic models, parallelization models are applied. The analysis of parallelization efficiency is conducted. The efficiency of parallelization in solution of an inverse problem of reaction under consideration by the genetic algorithm with the island model of parallelization of the computational process on a personal 4-core Intel Core I5 computer is 65 %.

  • EVALUATION OF THE SIDELOBE SUPPRESSION FILTER EFFECTIVENESS IN THE CONDITION OF THE BARKER SIGNALS MATCHED FILTERING

    Е.Е. Zavtur, I.I. Markovich, A.I. Panychev
    163-173
    Abstract

    The aim of the study is the synthesis of an additional filter for suppressing side lobes, providing the best compromise between the SNR decreasing and reduction of the sidelobe level at the filter output, consistent with the signals based on the Barker codes. To achieve this goal, it is necessary to find the coefficients of the filter for sidelobe suppression and evaluate its effec-tiveness. The digital sidelobe suppression filter is synthesized on the basis of solving the optimi-zation problem of searching the relative value conditional maximum in the main lobe of the output signal in a non-optimal filter under the condition of limiting the value of its sidelobes. Sidelobe suppression filters are investigated for matched filtering of signals formed on the basis of Barker sequences of different lengths. In all cases, the minimum possible length of the side lobes suppression filter pulse characteristic has been chosen, which is equal to the number of reports in the matched filter output. Quantitative estimates of the side lobes suppression and the accompanying SNR decreasing at the output of the synthesized filter for signals based on differ-ent Barker codes are obtained. It is established that the maximum suppression of sidelobe is achieved in the case of 5-element (11.65 dB) and 13-element (11.7 dB) Barker codes. The value of the side lobes in the output signal are minus 25,63 dB and minus 33,98 dB respectively. The minimum loss in SNR is obtained for signals based on 11-element (1.56 dB) and 13-element (0.73 dB) Barker codes. For a signal with a 5-element Barker code, this value is 1.75 dB. Thus, for 13-element Barker code signals the optimal ratio between the degree of sidelobe suppression and SNR decreasing in the processing of the matched filter output by an additional non matched filter is achieved. In cases where significant restrictions are imposed on the duration of the output sig-nal, the use of a 5-element Barker code is preferred.

  • METHOD OF TEXT-INDEPENDENT PERSONALITY IDENTIFICATION BY VOICE

    Y.A. Bryukhomitsky, V.M. Fedorov
    173-181
    Abstract

    An immunological method is proposed for solving the problem of text-independent identifi-cation of a person by voice, based on the principles of presentation and processing of voice infor-mation accepted in artificial immune systems. For personality identification by its voice, a Fanta model is used in which the voice signal is formed by passing through a high-order filter. Cepstral coefficients obtained on the basis of a linear speech predictor are used as feature vectors. The following analysis of the feature vectors is carried out on the basis of the apparatus of artificial immune systems using an immunological model of negative selection. The model implements de-centralized recognition of sequentially reaching speech fragments by comparing them with spe-cial, previously created recognition elements — detectors which imitate the immune-competent cells of the immune system. The matching is carried out using the Euclidean proximity measure according to the principle of negative selection. During the speech signal analysis, the decision "well-known/stranger" is making based on a statistics of detectors response frequency. The meth-od has been experimentally tested in IDE MATLAB and showed its effectiveness. The method is intended for continuous authentication control of the speaker’s identity at the rate of voice data income when text of arbitrary size and content is reproduced. It allows making a timely decision about the possible substitution of speakers. The advantage of the method is its complete protection from replay attacks. Effective implementation of the method, its increasing accuracy are closely related to the possibility of organizing the parallel calculations of large amounts of data, due to the size of the analyzed text and the size of detectors population. This circumstance determines the perspective of using high-performance multiprocessor computing systems.

  • HARDWARE-ORIENTED ALGORITHM OF A QUATERNION CRYPTOSYSTEM

    K.S. Kuznetsova, E.I. Dukhnich
    182-190
    Abstract

    The need to protect information provided in electronic form, due to the process of global computerization. The most common way to protect information is the use of cryptographic meth-ods, namely, data encryption algorithms. Currently, the development of information technology focuses on increasing the computing power of computers, which adversely affects the cryptograph-ic strength of most existing information protection algorithms - this is the reason for the continu-ous activity in the field of creating and improving cryptographic systems. Due to the fact that the hardware implementation of the cryptographic algorithm ensures its integrity, and also allows increasing the speed of data processing, the purpose of the work was to develop an algorithm ori-ented to the hardware implementation. The analysis shows that block ciphers with matrix multipli-cation are promising in this direction. Therefore, the matrix quaternion cipher R4 was taken as the source algorithm, since its multiplication process is based on matrix multiplication, which ensures ease of implementation and high performance. Also, this algorithm is chosen because it uses qua-ternions to create key-matrices, which allow generating direct and inverse matrices without signif-icant costs, which reduces the number of necessary computational operations for encryption and decryption, since this cryptographic algorithm is symmetric. The study is aimed at finding a matrix of this type, in which only the addition and shift operations will be used in encryption and decryption. The article describes the obtained HW-R4 algorithm, the principles of its hardware implementation, and also compares it with the existing matrix quaternionic algorithms by the characteristics of irreg-ular deviations, the correlation coefficient, and also by a visual representation of the encrypted im-ages and function graphs. Further development of the algorithm is possible in its immediate hard-ware implementation, for example, by using a programmable logic integrated circuit.

  • SUMMATION OF THE MULTIPLE ACCURACY ON CENTRAL AND GRAPHIC PROCESSORS USING THE MPRES LIBRARY

    K.S. Isupov, V.S. Knyazkov, A.S. Kuvaev
    191-203
    Abstract

    In many scientific applications, it is necessary to compute the sums of floating-point num-bers. Summation is a building block for many numerical algorithms, such as dot product, Taylor series, polynomial interpolation and numerical integration. However, the summation of large sets of numbers in finite-precision IEEE 754 arithmetic can be very inaccurate due to the accumulation of rounding errors. There are various ways to diminish rounding errors in the floating-point sum-mation. One of them is the use of multiple-precision arithmetic libraries. Such libraries provide data structures and subroutines for processing numbers whose precision exceeds the IEEE 754 floating-point formats. In this paper, we consider multiple-precision summation on hybrid CPU-GPU platforms using MPRES, a new software library for multiple-precision computations on CPUs and CUDA compatible GPUs. Unlike existing multiple-precision libraries based on the binary representation of numbers, MPRES uses a residue number system (RNS). In RNS, the num-ber is represented as a tuple of residues obtained by dividing this number by a given set of moduli, and multiple-precision operations such as addition, subtraction and multiplication are naturally divided into groups of reduced-precision operations on residues, performed in parallel and with-out carry propagation. We consider the algorithm for the addition of multiple-precision floatingpoint numbers in MPRES, as well as three summation algorithms: (1) recursive summation, (2) pairwise summation, and (3) block-parallel hybrid CPU-GPU summation. Experiments show that the hybrid algorithm allows the full utilization of the GPU’s resources, and therefore demonstrates better performance. On the other hand, the parallel computation of the digits (residues) of multi-ple-precision significands in RNS reduces computation time.

SECTION IV. RECONFIGURABLE AND NEURAL NETWORK COMPUTING SYSTEMS

  • IMPLEMENTATION OF THE DNA ASSEMBLY PROBLEM ON RECONFIGURABLE COMPUTER SYSTEMS

    A.I. Levina, E.E. Semernikova, D.A. Sorokin
    204-212
    Abstract

    The paper deals with research of methods and tools for the problem of DNA assembly, which provide considerable reducing of the processing time for the specified accuracy in compari-son with other methods and tools. We are considering using of reconfigurable computer systems for the assembly problems. As an example we use implementation of a key procedure of the algo-rithm of genome assembly Velvet Assembler – a procedure of contigs generation VelvetH. The base of the Velvet Assembler is a new generation method which implies generation of a de Bruijn graph, and, as a result, causes considerably variable density of data flows in the nondeterministic polynomial time complete problem of DNA assembly. That is why, in addition to the technology of structural-procedural organization of calculations, which is traditional for reconfigurable com-puter systems, we used special methods of synthesis of parallel-pipeline applications to provide a possibility in principle of implementation of such problems on reconfigurable computer systems. For evaluation of efficiency of reconfigurable computer systems use, we have developed, using a procedure VelvetН, a parallel-pipeline application which assembles a genome from short reads of a DNA Staphylococcus aureus. We have taken data from the database Sequence Read Archive from the website National Center for Biotechnology Information. The parallel-pipeline application was tested on a reconfigurable computer system “Tertius”, designed on the base of four Xilinx Kintex UltraScale XCKU 095 FPGAs. Use of this reconfigurable computer system provides 24-fold (and more) reduction of the execution time of the contig generation procedure for the DNA assembly problem against the existing analogs. Due to this we can conclude, that use of reconfigu-rable computer systems for implementation of the DNA assembly problem is a promising direction, which requires further scientific and technical research.

  • MULTIGRID METHOD TO SOLVE SPARSE LARGE AND EXTRA-LARGE SLAE BY RECONFIGURABLE COMPUTING SYSTEM

    A.V. Podoprigora, M.D. Chekina
    212-221
    Abstract

    This paper presents the possibility of solving large and extra-large sparse systems of linear algebraic equations having used multigrid method by reconfigurable computing systems (RCS). At the present moment, computer modeling is becoming topical. Replacing prototype models, it is being used in many areas of science of technology, and makes it possible to predict natural pro-cess and phenomena, as well as enable us to predict natural processes and phenomena. This mode of modeling is based on Physics and Mathematics models in occurrence of systems of linear equa-tions and the main matrix operator is provided with sparse structure. To attack large and extra-large sparse systems of linear equations will permit to improve calculation accuracy enable to increase data processing. Multigrid method is chosen for assessing efficiency of RCS, for attacked sparse large and extra-large SLAE, because of it is speed of convergence solution and precision of calculates. Multigrid method of solving SLAE by RSC is classified as strongly connected type task of high performance mean that both of interprocessor exchanges and intermemory exchanges which are compatible to or exceed the number of executed operations. In connection with this thereby efficient implementation of this task requires to both multichannel access and nonlinear memory access. This approach is impossible to implement by using compute systems of traditional architecture and directly affects the performance. High performance can be achieved due to multiconveyer calculations, so we use more flexible architecture compute system, as RSC, which are based on FPGA. Recent studies have revealed that most demanding function of multigrid method is sparse general matrix-matrix multiplication (spGEMM).Utilization of RSC can decrease problem time large and extra-large sparse SLAE, research result has showed by the example of sparse general matrix-matrix multiplication. Comparison of RSC productivity with multipro-cessing compute system show multiple advantages of RSC.

  • IMPLEMENTATION OF INVERSE KINEMATIC PROBLEM OF SEISMIC EXPLORATION FOR MICROSEISMIC MONITORING ON RECONFIGURABLE COMPUTER SYSTEMS IN REAL TIME

    I.I. Levin, K.N. Alekseev
    221-230
    Abstract

    The paper covers the possibility to create the digital hydrocarbon deposit models in real time on the basis of "passive" microseismic monitoring data. Processing of primary seismic data is impossible on multiprocessor computer systems of traditional architecture in real time because of a large amount of processed data; the complexity of storage organization of intermediate results; the labor-intensive operations. There is a different paradigm of computational process organiza-tion during the solution of labor-intensive tightly-coupled problems. It’s based on the synthesis of parallel-pipeline programs for reconfigurable computer systems (RCS). According to this ap-proach, the problem is presented as an information graph. The graph consist of a set of vertices (the performed operations), and a set of arcs describing the sequence of data transfer between the vertices, as well as input and output signals. Traditional methods of automatic synthesis of compu-tational structures involve the direct mapping of the information graph or part of it on the compu-tational field of RCS, constructed from a set of interconnected programmable logic integrated circuits (FPGAs). This approach is provided the maximum performance of the computer system, using all available hardware resources. However, the performance of the computer system is often higher than necessary at solving real-time problems on the RCS by traditional methods. It leads to overspending of the used RCS resource, the increased energy consumption and, as a result, the excessive cost of final product. In this regard, a new synthesis method of parallel-pipeline pro-grams for RCS was proposed for determining the minimum hardware resource at given solution time. According to the new approach, the information graph of the problem must be transformed in such a way that the synthesized computational structure has the required performance. The application of the new method was presented by the solution of the main computationally-laborious problem of microseismic monitoring: the inverse kinematic problem of seismic exploration. The estimation of the minimum hardware costs for a given solution time was given, and sev-eral configurations of RCS were proposed. Analysis of the results proved the effectiveness of the new approach application in comparison with traditional methods. Therefore, the new method of creating the parallel-pipeline programs for RCS can be used at solving the real-time problems.

  • NEURAL NETWORK SOLUTIONS FOR THE CONTROL OF HEXAPOD FOR NVIDIA JETSON EMBEDDED PLATFORM

    Y.A. Zhukov, E.B. Korotkov, A.V. Moroz
    231-241
    Abstract

    This research is a part of the work implemented by BSTU "Voenmeh" under the financial support of the Ministry of Education and Science of the Russian Federation for design and devel-opment of a precision mechanism with the parallel kinematics called "Hexapod". New released embedded platforms of artificial intelligence involve the interest of research engineers to imple-ment modern control algorithms at a new qualitative level. The purpose of this work is to obtain neural network solutions for hexapod control problems for the modern NVIDIA JETSON embed-ded platform. The control problems of hexapod are presented, which include solving the forward and inverse kinematics, controlling the forces at the hexapod's legs based on the computing of the inverse model of dynamics implementing the desired trajectory in Cartesian coordinates. We pro-pose to apply the neural networks for solving the forward kinematics problem and approximating Jacobi inverse matrices in the problem of computing the inverse model of dynamics. We used theNeural Network Toolbox Matlab for train neural networks and testing the proposed algorithms. The results of the training of neural networks for solving the forward kinematics problem with an accuracy of more than 10 times greater than the specified error of the control system in all work-space are presented. The architecture of the neural network for approximating the Jacobi inverse matrix is presented. The mathematical description of the neural network control algorithms is implemented. An approach to creating software for the NVIDIA JETSON embedded platform is described. The CUDA implementation of the developed algorithms for the JETSON TX1 platform was performed, testing of which showed the triple superiority of parallel algorithms in the speed of solving the forward kinematics problem compared to the traditional iterative approach based on the Newton-Raphson method.

  • MEDICAL ARTIFICIAL INTELLIGENCE SYSTEMS USING AN EXAMPLE OF THE LUNG CANCER DIAGNOSIS

    L.V. Utkin, O.S. Ipatov, М.А. Ryabinin, А.А. Meldo
    241-249
    Abstract

    By taking into account a rapid development of new methods of artificial intelligence and a large number of new developments related to intelligent systems for diagnosing oncological dis-eases, the aim of the work is to consider the main peculiarities of such the systems and develop a perspective system architecture that allows us to increase the efficiency of their training process and the accuracy of the diagnostic results. The paper proposes a brief analysis of intelligent sys-tems for diagnosing oncological diseases using an example of the lung cancer detection from computed tomography images, which are currently the main diagnostic tool for determining the prevalence of lung cancer, searching for local and distant metastases. The main types of existing intelligent diagnostic systems are considered and divided in subgroups from the point of view of the computed tomography information processing method usage. A description of the typical se-quence of stages of the computed tomography image processing for detection of malignant tumors in the lung, which includes such procedures as the dataset collection, image pre-processing, seg-mentation, detection of lung nodules, reducing the number of false-positive cases and the classifi-cation of tumors. It is shown that the main problem of most differential diagnosis systems is a fact that the training sample contains few alternative examples of various types of cancer and cannot be fully used to train the intelligent diagnostics system. To solve this problem, a new architecture of the intelligent diagnostics system is proposed in the paper, which makes it possible to signifi-cantly increase the accuracy of the lung nodule classification at the last stages of data processing. The main basis of this architecture is the Siamese neural network, which consists of two identical subnets with shared parameters connected at the output. The neural network training process uses all possible pairs of samples from the image base of malignant tumors, which significantly in-creases the size of the training sample and eliminates the effect of overfitting. During testing the system, an analyzed computed tomography image as an example of an unknown tissue is fed to the input of one of the networks, and an image from the base of malignant tumors is fed to the input of the second network.

  • DATA STRUCTURE COMPOSITION FOR THE DATA PROCESSING BY THE RECONFIGURABLE COMPUTING SYSTEMS

    S.A. Butenkov
    250-262
    Abstract

    The processes of accumulation, compression, storage, extraction, processing and analysis of data are traditionally considered in various sections of theoretical informatics. To solve applied problems of technical implementation of these stages of working with data, methodologically differ-ent approaches are used, based on heterogeneous mathematical data models, and, accordingly, tech-nically different software and hardware. At the same time, the optimization of the construction of data processing facilities is considered at each stage separately and using particular mathematical data models. This leads the developers of complex data processing systems to a situation in which, in addition to the actual processing, it is necessary to carry out the processes of converting the data presentation forms for the next stage of processing. Such intermediate conversions of data formats require a significant consumption of hardware resources and time, especially in the case of large amounts of data (Big Data). In a number of our works, a new mathematical apparatus for presenting and processing the data, based on the theory of algebraic systems for granular (integrated) data representation, has been introduced, developed and applied in new computing facilities. The new approach implements the ideas of the granular computing machine introduced by Lotfi Zadeh. Itorganically includes all the specified stages of working with data (on a uniform mathematical and algorithmic basis) and allows wide use of effective algorithms of linear complexity (greedy algo-rithms) in tasks related to data storage and processing. A new mathematical representation of data allows the data to be compressed naturally at all stages of processing at the expense of the basic properties of the informational granulation methodology. Since the methods based on the most typed algorithms of granular computations (without cycles and branching) are effectively implemented on reconfigurable high-performance computing systems, the present paper proposes structural solutions for implementing efficient algorithms of processing the granular data in the “fast algorithms”class for the granular computings built by the machines reconfigurable means.