METHOD OF HORIZONTAL SCALING OF DISTRIBUTED COMPUTINGS IN HIGH-LOAD SYSTEMS
Abstract
In this article, discovered the method of use of business analysis in choosing the optimal solution for practical application distribution platform streams. This method uses the open source software Apache Kafka, developed by LinkedIn and provided by Apache, for free (permissive free software) licensing. Considered scaling of a large number of concurrent requests. The use of this platform to handle current tasks for systems with more than 2 million users is also discussed.
The method of horizontal scaling is used in the work, which consists in dividing the system into separate structural components, often with their distribution on different servers, which can perform separate functions in parallel. This method requires a change in the program code in the operation of the program, while it remains mandatory to implement the program as a single integrated system.
The task was implemented using the Apache Kafka system, which allows not to go beyond the established indicators, and to use the principle of horizontal scaling. This system is implemented in real business projects: In particular, the American daily The New York Times uses Apache Kafka Streams to store and distribute real-time published content among various applications and systems that make it available to readers (subscription services, mobile applications, news aggregators, etc.), social network of Pinterest photos using Apache Kafka. Kafka Streams was able to scale its own Intelligent Budgeting System for its advertising infrastructure in real time, increasing the accuracy of financial cost forecasts. One of the largest banks in the Netherlands, Rabobank uses Kafka Streams to notify customers in real time about financial events.
The principle of its operation is to increase throughput by increasing the size of the batch of requests. Under the batch, we mean a group of requests from producers, sent as a whole, for storage in one partition. Producers are responsible for which batch to use to write to a specific partition. They work on the principle of cyclic planning (round-robin). In the general case, this algorithm describes process scheduling and communication of data batch. In this case, it is used as a data batch manager for load balance or determines how this can be done according to a certain semantic separation function (say, based on some key in the record).
The analysis carried out in this work showed that changing the parameter of the packet size (batch.size) can increase the bandwidth, reduce the load when processing packets, as well as the operations of streaming I/O. And at low load, Kafka can increase the delay in sending packages, as the producer will expect the batch to be ready.
Based on the results obtained, it can also be argued that it is not enough just to read and write a data stream, the main goal is to ensure the processing of streams in real time. For this purpose, it can be used the Kafka Streams API, a streaming processor that receives continuous data streams from input channels, performs some processing on that input, and provides continuous data output.
Thus, thanks to the new architecture, it was possible to implement the possibility of scaling without increasing the number of equipment. At the same time, the system was transferred to a new principle of operation - modularity (because Kafka allows you to connect modules and applications regardless of the programming language), and to ensure further escalation of the client's business.
Keywords: business analysis, architectural solutions, Apache Kafka, Apache Connector, Apache Streams, horizontal scaling, distributed computing, parallel computing.
Full Text:
PDF (Українська)DOI: http://dx.doi.org/10.30970/eli.13.7
Refbacks
- There are currently no refbacks.