Characterizing Hardware Utilization on Edge Devices when Inferring Compressed Deep Learning Models

Ahmad Naufal Labiib Nabhaan; Rakandhiya Daanii Rachmanto; Arief Setyanto

doi:10.30812/matrik.v24i1.3938

Ahmad Naufal Labiib Nabhaan Universitas AMIKOM, Yogyakarta, Indonesia
Rakandhiya Daanii Rachmanto University of Georgia, Athens, United State
Arief Setyanto Universitas AMIKOM, Yogyakarta, Indonesia

DOI: https://doi.org/10.30812/matrik.v24i1.3938

Keywords: Deep Learning, Edge Devices, Hardware Utilization, Memory Allocation, Post-training Quantization

Abstract

Implementing edge AI involves running AI algorithms near the sensors. Deep Learning (DL) Model has successfully tackled image classification tasks with remarkable performance. However, their requirements for huge computing resources hinder the implementation of edge devices. Compressing the model is an essential task to allow the implementation of the DL model on edge devices. Post-training quantization (PTQ) is a compression technique that reduces the bit representation of the model weight parameters. This study looks at the impact of memory allocation on the latency of compressed DL models on Raspberry Pi 4 Model B (RPi4B) and NVIDIA Jetson Nano (J. Nano). This research aims to understand hardware utilization in central processing units (CPU), graphics processing units (GPU),
and memory. This study focused on the quantitative method, which controls memory allocation and measures warm-up time, latency, CPU, and GPU utilization. Speed comparison among inference of DL models on RPi4B and J. Nano. This paper observes the correlation between hardware utilization versus the various DL inference latencies. According to our experiment, we concluded that smaller memory allocation led to high latency on both RPi4B and J. Nano. CPU utilization on RPi4B. CPU utilization in RPi4B increases along with the memory allocation; however, the opposite is shown on J. Nano since the GPU carries out the main computation on the device. Regarding computation, the
smaller DL Size and smaller bit representation lead to faster inference (low latency), while bigger bit representation on the same DL model leads to higher latency.

Downloads

Download data is not yet available.

References

[1] I. H. Sarker, “Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions,”
SN Computer Science, vol. 2, no. 6, p. 420, nov 2021, https://doi.org/10.1007/s42979-021-00815-1.
[2] L. Alzubaidi, J. Zhang, A. J. Humaidi, A. Al-Dujaili, Y. Duan, O. Al-Shamma, J. Santamar´ıa, M. A. Fadhel, M. Al-Amidie, and
L. Farhan, “Review of deep learning: concepts, CNN architectures, challenges, applications, future directions,” Journal of Big
Data, vol. 8, no. 1, p. 53, mar 2021, https://doi.org/10.1186/s40537-021-00444-8.
[3] A. Susanto, C. A. Sari, E. H. Rachmawanto, I. U.W. Mulyono, and N. Mohd Yaacob, “A Comparative Study of Javanese Script Classification with GoogleNet, DenseNet, ResNet, VGG16 and VGG19,” Scientific Journal of Informatics, vol. 11, no. 1, pp.
31–40, jan 2024, https://doi.org/10.15294/sji.v11i1.47305.
[4] H. P. Hadi, E. H. Rachmawanto, and R. R. Ali, “Comparison of DenseNet-121 and MobileNet for Coral Reef Classification,”
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 23, no. 2, pp. 333–342, mar 2024, https:
//doi.org/10.30812/matrik.v23i2.3683.
[5] D. Saha, M. P. Mangukia, and A. Manickavasagan, “Real-Time Deployment of MobileNetV3 Model in Edge Computing Devices
Using RGB Color Images for Varietal Classification of Chickpea,” Applied Sciences, vol. 13, no. 13, p. 7804, jul 2023,
https://doi.org/10.3390/app13137804.
[6] R. Raza, F. Zulfiqar, M. O. Khan, M. Arif, A. Alvi, M. A. Iftikhar, and T. Alam, “Lung-EffNet: Lung cancer classification
using EfficientNet from CT-scan images,” Engineering Applications of Artificial Intelligence, vol. 126, p. 106902, nov 2023,
https://doi.org/10.1016/j.engappai.2023.106902.
[7] T. S. Ajani, A. L. Imoize, and A. A. Atayero, “An Overview of Machine Learning within Embedded and Mobile Devices–
Optimizations and Applications,” Sensors, vol. 21, no. 13, p. 4412, jun 2021, https://doi.org/10.3390/s21134412.
[8] J. Lee, L. Mukhanov, A. S. Molahosseini, U. Minhas, Y. Hua, J. Martinez del Rincon, K. Dichev, C.-H. Hong, and H. Vandierendonck,
“Resource-Efficient Convolutional Networks: A Survey on Model-, Arithmetic-, and Implementation-Level Techniques,”
ACM Computing Surveys, vol. 55, no. 13s, pp. 1–36, dec 2023, https://doi.org/10.1145/3587095.
[9] A. Abouaomar, S. Cherkaoui, Z. Mlika, and A. Kobbane, “Resource Provisioning in Edge Computing for Latency-Sensitive
Applications,” IEEE Internet of Things Journal, vol. 8, no. 14, pp. 11 088–11 099, jul 2021, https://doi.org/10.1109/JIOT.2021.
3052082.
[10] P. P. Ray, “A review on TinyML: State-of-the-art and prospects,” Journal of King Saud University - Computer and Information
Sciences, vol. 34, no. 4, pp. 1595–1623, apr 2022, https://doi.org/10.1016/j.jksuci.2021.11.019.
[11] L. U. Khan, I. Yaqoob, N. H. Tran, S. M. A. Kazmi, T. N. Dang, and C. S. Hong, “Edge-Computing-Enabled Smart Cities: A
Comprehensive Survey,” IEEE Internet of Things Journal, vol. 7, no. 10, pp. 10 200–10 232, oct 2020, https://doi.org/10.1109/
JIOT.2020.2987070.
[12] A. Garcia-Perez, R. Mi˜n´on, A. I. Torre-Bastida, and E. Zulueta-Guerrero, “Analysing Edge Computing Devices for the Deployment
of Embedded AI,” Sensors, vol. 23, no. 23, p. 9495, nov 2023, https://doi.org/10.3390/s23239495.
[13] A. Carvalho, D. Riordan, and J.Walsh, “A Novel Edge Platform Streamlining Connectivity between Modern Edge Devices and
the Cloud,” Future Internet, vol. 16, no. 4, p. 111, mar 2024, https://doi.org/10.3390/fi16040111.
[14] K. Sarvajcz, L. Ari, and J. Menyhart, “AI on the Road: NVIDIA Jetson Nano-Powered Computer Vision-Based System for
Real-Time Pedestrian and Priority Sign Detection,” Applied Sciences, vol. 14, no. 4, p. 1440, feb 2024, https://doi.org/10.3390/
app14041440.
[15] S. Park, J. Lee, and H. Kim, “Hardware Resource Analysis in Distributed Training with Edge Devices,” Electronics, vol. 9,
no. 1, p. 28, dec 2019, https://doi.org/10.3390/electronics9010028.
[16] H. Li, Z. Wang, X. Yue, W. Wang, H. Tomiyama, and L. Meng, “An architecture-level analysis on deep learning models for
low-impact computations,” Artificial Intelligence Review, vol. 56, no. 3, pp. 1971–2010, mar 2023, https://doi.org/10.1007/
s10462-022-10221-5.
[17] R. P. M. D. Labib, S. Hadi, and P. D. Widayaka, “Low Cost System for Face Mask Detection Based Haar Cascade Classifier
Method,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 1, pp. 21–30, nov 2021,
https://doi.org/10.30812/matrik.v21i1.1187.
[18] J. Maly and R. Saab, “A simple approach for quantizing neural networks,” Applied and Computational Harmonic Analysis,
vol. 66, pp. 138–150, sep 2023, https://doi.org/10.1016/j.acha.2023.04.004.
[19] J. Zhang, Y. Zhou, and R. Saab, “Post-training Quantization for Neural Networks with Provable Guarantees,” SIAM Journal on
Mathematics of Data Science, vol. 5, no. 2, pp. 373–399, 2023, https://doi.org/10.1137/22M1511709.
[20] C. Ji, F. Wu, Z. Zhu, L.-P. Chang, H. Liu, and W. Zhai, “Memory-efficient deep learning inference with incremental weight
loading and data layout reorganization on edge systems,” Journal of Systems Architecture, vol. 118, p. 102183, sep 2021,
https://doi.org/10.1016/j.sysarc.2021.102183.
[21] C. Chen, P. Zhang, H. Zhang, J. Dai, Y. Yi, H. Zhang, and Y. Zhang, “Deep Learning on Computational-Resource-Limited
Platforms: A Survey,” Mobile Information Systems, vol. 2020, pp. 1–19, mar 2020, https://doi.org/10.1155/2020/8454327.
[22] O. Shafi, C. Rai, R. Sen, and G. Ananthanarayanan, “Demystifying TensorRT: Characterizing Neural Network Inference Engine
on Nvidia Edge Devices,” in 2021 IEEE International Symposium on Workload Characterization (IISWC). IEEE, nov 2021,
pp. 226–237, https://doi.org/10.1109/IISWC53511.2021.00030.
[23] C. Wisultschew, A. Perez, A. Otero, G. Mujica, and J. Portilla, “Characterizing Deep Neural Networks on Edge Computing
Systems for Object Classification in 3D Point Clouds,” IEEE Sensors Journal, vol. 22, no. 17, pp. 17 075–17 089, sep 2022,
https://doi.org/10.1109/JSEN.2022.3193060.
[24] P. S.K, S. A. Kesanapalli, and Y. Simmhan, “Characterizing the Performance of Accelerated Jetson Edge Devices for Training
Deep Learning Models,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 6, no. 3, pp. 1–26,
dec 2022, https://doi.org/10.1145/3570604.
[25] S. Jing, Q. Bao, P. Wang, X. Tang, and D. Wu, “Characterizing AI Model Inference Applications Running in the SGX Environment,”
in 2021 IEEE International Conference on Networking, Architecture and Storage (NAS). IEEE, oct 2021, pp. 1–4,
https://doi.org/10.1109/NAS51552.2021.9605445.
[26] J. Hao, P. Subedi, I. K. Kim, and L. Ramaswamy, “Characterizing Resource Heterogeneity in Edge Devices for Deep Learning
Inferences,” in Proceedings of the 2021 on Systems and Network Telemetry and Analytics. New York: ACM, jun 2020, pp.
21–24, https://doi.org/10.1145/3452411.3464446.
[27] N. James, L.-Y. Ong, and M.-C. Leow, “Exploring Distributed Deep Learning Inference Using Raspberry Pi Spark Cluster,”
Future Internet, vol. 14, no. 8, p. 220, jul 2022, https://doi.org/10.3390/fi14080220.
[28] T. Aboneh, A. Rorissa, R. Srinivasagan, and A. Gemechu, “Computer Vision Framework for Wheat Disease Identification
and Classification Using Jetson GPU Infrastructure,” Technologies, vol. 9, no. 3, p. 47, jul 2021, https://doi.org/10.3390/
technologies9030047.
[29] M. A. Wakili, H. A. Shehu, M. H. Sharif, M. H. U. Sharif, A. Umar, H. Kusetogullari, I. F. Ince, and S. Uyaver, “Classification
of Breast Cancer Histopathological Images Using DenseNet and Transfer Learning,” Computational Intelligence and
Neuroscience, vol. 2022, pp. 1–31, oct 2022, https://doi.org/10.1155/2022/8904768.
[30] J. Lee, M. Yu, Y. Kwon, and T. Kim, “Quantune: Post-training quantization of convolutional neural networks using extreme
gradient boosting for fast deployment,” Future Generation Computer Systems, vol. 132, pp. 124–135, jul 2022, https://doi.org/
10.1016/j.future.2022.02.005.
[31] Y. Nahshan, B. Chmiel, C. Baskin, E. Zheltonozhskii, R. Banner, A. M. Bronstein, and A. Mendelson, “Loss aware post-training
quantization,” Machine Learning, vol. 110, no. 11-12, pp. 3245–3262, dec 2021, https://doi.org/10.1007/s10994-021-06053-z.
[32] ´ E. T. Morais, G. A. Barberes, I. V. A. F. Souza, F. G. Leal, J. V. P. Guzzo, and A. L. D. Spigolon, “Pearson Correlation Coefficient
Applied to Petroleum System Characterization: The Case Study of Potiguar and Reconcavo Basins, Brazil,” Geosciences,
vol. 13, no. 9, p. 282, sep 2023, https://doi.org/10.3390/geosciences13090282.

Characterizing Hardware Utilization on Edge Devices when Inferring Compressed Deep Learning Models

Abstract

Downloads

References

Most read articles by the same author(s)