OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS
Abstract
Background. Description and preparation of modern approaches for deep learning object detection models are provided. Deep learning frameworks for model training and inference, such as TensorFlow and TensorFlow Lite, are used as bases. The concepts of deep learning model optimization are analyzed.
Materials and Methods. The quantized int8 models are used as a baseline for optimization effectiveness estimation. The delegation approach includes software or hardware-optimized variants of neural operations. It prepared to speed up the inference process on target devices. The device with reduced performance resources or microcontroller without floating-point blocks uses a case of base-optimization model with int8 weights. The TensorFlow Lite framework has various quantization types outlined in a detailed explanation. Benchmarks for modern single-board devices are ready, and the correlation between using different optimization approaches, types of single-board platforms, and model inference speed analyses.
Results and Discussion. All tested models are pretrained using the MS COCO dataset (80 classes). All models were prepared for the experiment with 8-bit full integer quantization and output-TFLite model generation using TensorFlow Object Detection API Docker images and Python 3.11. The testing data samples are obtained from the MS COCO validation dataset archive. The size of the image input is 640x640 RGB. The comparison of image recognition time to 640x640 RGB was conducted on Raspberry Pi 5, Raspberry Pi 4, and Jetson Nano 2GB. Only the Raspberry Pi 5 target device achieved real-time execution (100 ms at most or one fps) as it has more CPU performance than other devices.
Conclusion. Confirmation of the real-time execution approach was achieved by using reference models with reduced image sizes (320x320 RGB). TensorFlow standard model Zoo models, compiled with the TensorRT compiler, were used for the Jetson Nano target as an NPU-optimized case. Real-time execution (100 ms at most or one fps) is reaching for most models and target devices. Such an approach is suitable for less powerful devices with ARM Cortex-A processors.
Keywords: single-board computers, modeling, benchmarking, neural networks, object detection, optimization.
Full Text:
PDFReferences
- Ltd, R. P. (Trading). (2023). Buy a Raspberry Pi 4 Model B. Raspberry Pi. https://www.raspberrypi.com/products/raspberry-pi-4-model-b/
- Raspberry Pi Ltd. (n.d.). Buy a Raspberry Pi 5. Raspberry Pi. https://www.raspberrypi.com/products/raspberry-pi-5/
- NVIDIA Jetson Nano. (2024). NVIDIA. https://www.nvidia.com/ru-ru/autonomous-machines/embedded-systems/jetson-nano/
- Dev Board. (n.d.). Coral. https://coral.ai/products/dev-board/
- MLPerf Inference - MLCommons. (2024, November 18). MLCommons. https://mlcommons.org/working-groups/benchmarks/inference/
- Tensorflow. (2019, July 15). Tensorflow/models. GitHub. https://github.com/tensorflow/models/tree/master/research/object_detection
- Jocher, G. (2020, August 21). ultralytics/yolov5. GitHub. https://github.com/ultralytics/yolov5
- Ameen, S., Siriwardana, K., & Theodoridis, T. (2023). Optimizing Deep Learning Models For Raspberry Pi. ArXiv (Cornell University). https://doi.org/10.48550/arxiv.2304.13039
- Assunção, E., Pedro Dinis Gaspar, Khadijeh Alibabaei, Maria Paula Simões, Proença, H., Vasco, & Caldeira, P. (2022). Real-Time Image Detection for Edge Devices: A Peach Fruit Detection Application. Future Internet, 14(11), 323–323. https://doi.org/10.3390/fi14110323
- Gabriele Proietti Mattia, & Beraldi, R. (2021). A study on real-time image processing applications with edge computing support for mobile devices. IRIS Research Product Catalog (Sapienza University of Rome). https://doi.org/10.1109/ds-rt52167.2021.9576139
- Eduardo Timóteo Assunção, Pedro Dinis Gaspar, Ricardo Alves Mesquita, Maria João Simões, Khadijeh Alibabaei, André Veiros, & Proença, H. (2022). Real-Time Weed Control Application Using a Jetson Nano Edge Device and a Spray Mechanism. 14(17), 4217–4217. https://doi.org/10.3390/rs14174217
- Post-training quantization | TensorFlow Lite. (n.d.). TensorFlow. https://www.tensorflow.org/lite/performance
- COCO - Common Objects in Context. (n.d.). Cocodataset.org. https://cocodataset.org/
- Models - Object Detection | Coral. (2020). Coral; Coral. https://coral.ai/models/object-detection/
DOI: http://dx.doi.org/10.30970/eli.29.6
Refbacks
- There are currently no refbacks.