USAGE OF APACHE KAFKA FOR LOW-LATENCY IMAGE PROCESSING

N. Karpiuk, Halyna Klym, T. Tkachuk

Abstract


This study addresses the limitations of conventional centralized systems in handling the surge in real-time image processing demands. We propose a distributed architecture employing Apache Kafka to achieve near real-time image analysis. Our approach implements a decoupled workflow for image acquisition, processing, and spreading, facilitating parallel execution across a processing node cluster. Kafka acts as the core communication and data flow infrastructure, ensuring scalability, fault tolerance, and high throughput. Evaluations demonstrate substantial performance gains compared to a centralized system, validating the feasibility, advantages, and limitations of Kafka for distributed image processing. We systematically analyzed the impact of topic partitioning, consumer group configuration, and processing workload on performance. This work presents a robust solution for near real-time image processing tasks, promoting the development of efficient and scalable image analysis applications.

Key words: Apache Kafka, image processing, distributed environment, parallel processing.


Full Text:

PDF

References


  1. Ortu, M. Fault-insertion and fault-fixing behavioural patterns in Apache Software Foundation Projects / M. Ortu, G. Destefanis, T. Hall, D. Bowes // Information and Software Technology. – 2023. – Vol. 158. – P. 107187.
  2. Xiao L. An empirical study on the usage of mocking frameworks in Apache software foundation / L. Xiao, G. Zhao, X. Wang, K. Li, E. Lim, C. Wei, ... X. Wang // Empirical Software Engineering. – 2024. – Vol. 29(2). – P. 39.
  3. Gharehyazie M. Developer initiation and social interactions in OSS: A case study of the Apache Software Foundation / M. Gharehyazie, D. Posnett, B. Vasilescu, V. Filkov // Empirical Software Engineering. – 2015. – Vol. 20. – P. 1318-1353.
  4. Kato K. A study of a scalable distributed stream processing infrastructure using Ray and Apache Kafka / K. Kato, A. Takefusa, H. Nakada, M. Oguchi // IEEE International Conference on Big Data (Big Data). – 2018. – P. 5351-5353.
  5. Peddireddy K. Kafka-based architecture in building data lakes for real-time data streams. International Journal of Computer Applications / K. Peddireddy. - 2023). 185(9), 1-3.
  6. Calderon G. Monitoring framework for the performance evaluation of an IoT platform with Elasticsearch and Apache Kafka / G. Calderon, G. del Campo, E. Saavedra, A. Santamaría // Information Systems Frontiers. – 2023. P. 1-17.
  7. Liu J. C. An event-based data processing system using Kafka container cluster on Kubernetes environment / J. C. Liu, C. H. Hsu, J. H. Zhang, E. Kristiani, C. N. Yang // Neural Computing and Applications. – 2023. – P. 1-18.
  8. Raptis T. P. Efficient topic partitioning of Apache Kafka for high-reliability real-time data streaming applications / T. P. Raptis, C. Cicconetti, A. Passarella // Future Generation Computer Systems. – 2014. – Vol. 154. – P. 173-188.
  9. Blamey B. Apache spark streaming, Kafka and HarmonicIO: a performance benchmark and architecture comparison for enterprise and scientific computing / B. Blamey, A. Hellander, S. Toor // International Symposium on Benchmarking, Measuring and Optimization. – 2019. – P. 335-347.
  10. D'silva G. M. Real-time processing of IoT events with historic data using Apache Kafka and Apache Spark with dashing framework / G. M. D'silva, A. Khan, S. Bari // 2nd IEEE International conference on recent trends in electronics, information & communication technology (RTEICT). – 2017. – P. 1804-1809.
  11. Wirz L. Design and development of A cloud-based IDS using Apache Kafka and Spark Streaming / L. Wirz, R. Tanthanathewin, A. Ketphet, S. Fugkeaw // 19th International Joint Conference on Computer Science and Software Engineering (JCSSE). – 2022. – P. 1-6.
  12. Ouhssini M. Distributed intrusion detection system in the cloud environment based on Apache Kafka and Apache Spark / M. Ouhssini, K. Afdel, M. Idhammad, E. Agherrabi // Fifth International Conference On Intelligent Computing in Data Sciences (ICDS). – 2021. – P. 1-6.
  13. Kim Y. K. Large scale image processing in real-time environments with Kafka / Y. K. Kim, C. S. Jeong // Proceedings of the 6th AIRCC International Conference on Parallel, Distributed Computing Technologies and Applications (PDCTA). – 2017. – P. 207-215.
  14. Sundar Rajan K. A scalable data pipeline for realtime geofencing using Apache Pulsar / K. Sundar Rajan, A. Vishal, C. Babu // Computational Intelligence in Data Science: 4th IFIP TC 12 International Conference, ICCIDS 2021. – 2021. – P. 3-14.
  15. Rafey A. Pothole detection technique / A. Rafey, M. S. A. Quadri, A. B. A. Nahdi, A. B. A., M. A. Bari // Mathematical Statistician and Engineering Applications. – 2023. – Vol. 72(1). – P. 1316-1327.
  16. Raptis T. P. Efficient topic partitioning of Apache Kafka for high-reliability real-time data streaming applications / T. P. Raptis, C. Cicconetti, A. Passarella // Future Generation Computer Systems. – 2024. – Vol. 154. – P. 173-188.
  17. Sgambelluri A. Reliable and scalable Kafka-based framework for optical network telemetry / A. Sgambelluri, A. Pacini, F. Paolucci, P. Castoldi, L. Valcarenghi // Journal of Optical Communications and Networking. – 2021. – Vol. 13(10). – P. E42-E52.
  18. Nogueira A. F. Monitoring a ci/cd workflow using process mining / A. F. Nogueira, M. Zenha-Rela // SN Computer Science. – 2021. – Vol. 2(6). – P. 448.
  19. Kul S. Event-based microservices with Apache Kafka streams: A real-time vehicle detection system based on type, color, and speed attributes / S. Kul, I. Tashiev, A. Şentaş, A. Sayar // IEEE Access. – 2021. – Vol. 9. – P. 83137-83148.
  20. Hong S. Recognition method of license plate for black box video using Apache Kafka / S. Hong, S. Jung, C. Jeong // International Conference on Electronics, Information, and Communication (ICEIC). – 2018. – P. 1-3.
  21. Htut A. M. Development of near real-time wireless image sequence streaming cloud using Apache Kafka for road traffic monitoring application / A. M. Htut, C. Aswakul // PLoS one. - 2022. – Vol. 17(3). – P. e0264923.
  22. Gütlein M. Modeling and simulation as a service using Apache Kafka / M. Gütlein, A. Djanatliev // SIMULTECH. – 2020. – P. 171-180.
  23. Omran N. F. Breast cancer identification from patients’ tweet streaming using machine learning solution on spark / N. F. Omran, S. F. Abd-el Ghany, H. Saleh, A. Nabil // Complexity. – P. 2021. – P. 1-12.
  24. Ilasariya S. Image steganography using Blowfish algorithm and transmission via Apache Kafka / S. Ilasariya, P. Patel, V. Patel, S. Gharat // 4th International Conference on Smart Systems and Inventive Technology (ICSSIT). – 2022. – P. 1320-1325.




DOI: http://dx.doi.org/10.30970/eli.26.5

Refbacks

  • There are currently no refbacks.