Characterizing Hardware Utilization on Edge Devices when Inferring Compressed Deep Learning Models

Ahmad Naufal Labiib Nabhaan; Rakandhiya Daanii Rachmanto; Arief Setyanto

doi:10.30812/matrik.v24i1.3938

Authors

Ahmad Naufal Labiib Nabhaan Universitas AMIKOM, Yogyakarta, Indonesia
Rakandhiya Daanii Rachmanto University of Georgia, Athens, United State
Arief Setyanto Universitas AMIKOM, Yogyakarta, Indonesia

DOI:

https://doi.org/10.30812/matrik.v24i1.3938

Keywords:

Deep Learning, Edge Devices, Hardware Utilization, Memory Allocation, Post-training Quantization

Abstract

Implementing edge AI involves running AI algorithms near the sensors. Deep Learning (DL) Model has successfully tackled image classification tasks with remarkable performance. However, their requirements for huge computing resources hinder the implementation of edge devices. Compressing the model is an essential task to allow the implementation of the DL model on edge devices. Post-training quantization (PTQ) is a compression technique that reduces the bit representation of the model weight parameters. This study looks at the impact of memory allocation on the latency of compressed DL models on Raspberry Pi 4 Model B (RPi4B) and NVIDIA Jetson Nano (J. Nano). This research aims to understand hardware utilization in central processing units (CPU), graphics processing units (GPU),
and memory. This study focused on the quantitative method, which controls memory allocation and measures warm-up time, latency, CPU, and GPU utilization. Speed comparison among inference of DL models on RPi4B and J. Nano. This paper observes the correlation between hardware utilization versus the various DL inference latencies. According to our experiment, we concluded that smaller memory allocation led to high latency on both RPi4B and J. Nano. CPU utilization on RPi4B. CPU utilization in RPi4B increases along with the memory allocation; however, the opposite is shown on J. Nano since the GPU carries out the main computation on the device. Regarding computation, the
smaller DL Size and smaller bit representation lead to faster inference (low latency), while bigger bit representation on the same DL model leads to higher latency.

Downloads

Download data is not yet available.

References

[1] I. H. Sarker, â€œDeep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions,â€
SN Computer Science, vol. 2, no. 6, p. 420, nov 2021, https://doi.org/10.1007/s42979-021-00815-1.
[2] L. Alzubaidi, J. Zhang, A. J. Humaidi, A. Al-Dujaili, Y. Duan, O. Al-Shamma, J. SantamarÂ´Ä±a, M. A. Fadhel, M. Al-Amidie, and
L. Farhan, â€œReview of deep learning: concepts, CNN architectures, challenges, applications, future directions,â€ Journal of Big
Data, vol. 8, no. 1, p. 53, mar 2021, https://doi.org/10.1186/s40537-021-00444-8.
[3] A. Susanto, C. A. Sari, E. H. Rachmawanto, I. U.W. Mulyono, and N. Mohd Yaacob, â€œA Comparative Study of Javanese Script Classification with GoogleNet, DenseNet, ResNet, VGG16 and VGG19,â€ Scientific Journal of Informatics, vol. 11, no. 1, pp.
31â€“40, jan 2024, https://doi.org/10.15294/sji.v11i1.47305.
[4] H. P. Hadi, E. H. Rachmawanto, and R. R. Ali, â€œComparison of DenseNet-121 and MobileNet for Coral Reef Classification,â€
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 23, no. 2, pp. 333â€“342, mar 2024, https:
//doi.org/10.30812/matrik.v23i2.3683.
[5] D. Saha, M. P. Mangukia, and A. Manickavasagan, â€œReal-Time Deployment of MobileNetV3 Model in Edge Computing Devices
Using RGB Color Images for Varietal Classification of Chickpea,â€ Applied Sciences, vol. 13, no. 13, p. 7804, jul 2023,
https://doi.org/10.3390/app13137804.
[6] R. Raza, F. Zulfiqar, M. O. Khan, M. Arif, A. Alvi, M. A. Iftikhar, and T. Alam, â€œLung-EffNet: Lung cancer classification
using EfficientNet from CT-scan images,â€ Engineering Applications of Artificial Intelligence, vol. 126, p. 106902, nov 2023,
https://doi.org/10.1016/j.engappai.2023.106902.
[7] T. S. Ajani, A. L. Imoize, and A. A. Atayero, â€œAn Overview of Machine Learning within Embedded and Mobile Devicesâ€“
Optimizations and Applications,â€ Sensors, vol. 21, no. 13, p. 4412, jun 2021, https://doi.org/10.3390/s21134412.
[8] J. Lee, L. Mukhanov, A. S. Molahosseini, U. Minhas, Y. Hua, J. Martinez del Rincon, K. Dichev, C.-H. Hong, and H. Vandierendonck,
â€œResource-Efficient Convolutional Networks: A Survey on Model-, Arithmetic-, and Implementation-Level Techniques,â€
ACM Computing Surveys, vol. 55, no. 13s, pp. 1â€“36, dec 2023, https://doi.org/10.1145/3587095.
[9] A. Abouaomar, S. Cherkaoui, Z. Mlika, and A. Kobbane, â€œResource Provisioning in Edge Computing for Latency-Sensitive
Applications,â€ IEEE Internet of Things Journal, vol. 8, no. 14, pp. 11 088â€“11 099, jul 2021, https://doi.org/10.1109/JIOT.2021.
3052082.
[10] P. P. Ray, â€œA review on TinyML: State-of-the-art and prospects,â€ Journal of King Saud University - Computer and Information
Sciences, vol. 34, no. 4, pp. 1595â€“1623, apr 2022, https://doi.org/10.1016/j.jksuci.2021.11.019.
[11] L. U. Khan, I. Yaqoob, N. H. Tran, S. M. A. Kazmi, T. N. Dang, and C. S. Hong, â€œEdge-Computing-Enabled Smart Cities: A
Comprehensive Survey,â€ IEEE Internet of Things Journal, vol. 7, no. 10, pp. 10 200â€“10 232, oct 2020, https://doi.org/10.1109/
JIOT.2020.2987070.
[12] A. Garcia-Perez, R. MiËœnÂ´on, A. I. Torre-Bastida, and E. Zulueta-Guerrero, â€œAnalysing Edge Computing Devices for the Deployment
of Embedded AI,â€ Sensors, vol. 23, no. 23, p. 9495, nov 2023, https://doi.org/10.3390/s23239495.
[13] A. Carvalho, D. Riordan, and J.Walsh, â€œA Novel Edge Platform Streamlining Connectivity between Modern Edge Devices and
the Cloud,â€ Future Internet, vol. 16, no. 4, p. 111, mar 2024, https://doi.org/10.3390/fi16040111.
[14] K. Sarvajcz, L. Ari, and J. Menyhart, â€œAI on the Road: NVIDIA Jetson Nano-Powered Computer Vision-Based System for
Real-Time Pedestrian and Priority Sign Detection,â€ Applied Sciences, vol. 14, no. 4, p. 1440, feb 2024, https://doi.org/10.3390/
app14041440.
[15] S. Park, J. Lee, and H. Kim, â€œHardware Resource Analysis in Distributed Training with Edge Devices,â€ Electronics, vol. 9,
no. 1, p. 28, dec 2019, https://doi.org/10.3390/electronics9010028.
[16] H. Li, Z. Wang, X. Yue, W. Wang, H. Tomiyama, and L. Meng, â€œAn architecture-level analysis on deep learning models for
low-impact computations,â€ Artificial Intelligence Review, vol. 56, no. 3, pp. 1971â€“2010, mar 2023, https://doi.org/10.1007/
s10462-022-10221-5.
[17] R. P. M. D. Labib, S. Hadi, and P. D. Widayaka, â€œLow Cost System for Face Mask Detection Based Haar Cascade Classifier
Method,â€ MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 1, pp. 21â€“30, nov 2021,
https://doi.org/10.30812/matrik.v21i1.1187.
[18] J. Maly and R. Saab, â€œA simple approach for quantizing neural networks,â€ Applied and Computational Harmonic Analysis,
vol. 66, pp. 138â€“150, sep 2023, https://doi.org/10.1016/j.acha.2023.04.004.
[19] J. Zhang, Y. Zhou, and R. Saab, â€œPost-training Quantization for Neural Networks with Provable Guarantees,â€ SIAM Journal on
Mathematics of Data Science, vol. 5, no. 2, pp. 373â€“399, 2023, https://doi.org/10.1137/22M1511709.
[20] C. Ji, F. Wu, Z. Zhu, L.-P. Chang, H. Liu, and W. Zhai, â€œMemory-efficient deep learning inference with incremental weight
loading and data layout reorganization on edge systems,â€ Journal of Systems Architecture, vol. 118, p. 102183, sep 2021,
https://doi.org/10.1016/j.sysarc.2021.102183.
[21] C. Chen, P. Zhang, H. Zhang, J. Dai, Y. Yi, H. Zhang, and Y. Zhang, â€œDeep Learning on Computational-Resource-Limited
Platforms: A Survey,â€ Mobile Information Systems, vol. 2020, pp. 1â€“19, mar 2020, https://doi.org/10.1155/2020/8454327.
[22] O. Shafi, C. Rai, R. Sen, and G. Ananthanarayanan, â€œDemystifying TensorRT: Characterizing Neural Network Inference Engine
on Nvidia Edge Devices,â€ in 2021 IEEE International Symposium on Workload Characterization (IISWC). IEEE, nov 2021,
pp. 226â€“237, https://doi.org/10.1109/IISWC53511.2021.00030.
[23] C. Wisultschew, A. Perez, A. Otero, G. Mujica, and J. Portilla, â€œCharacterizing Deep Neural Networks on Edge Computing
Systems for Object Classification in 3D Point Clouds,â€ IEEE Sensors Journal, vol. 22, no. 17, pp. 17 075â€“17 089, sep 2022,
https://doi.org/10.1109/JSEN.2022.3193060.
[24] P. S.K, S. A. Kesanapalli, and Y. Simmhan, â€œCharacterizing the Performance of Accelerated Jetson Edge Devices for Training
Deep Learning Models,â€ Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 6, no. 3, pp. 1â€“26,
dec 2022, https://doi.org/10.1145/3570604.
[25] S. Jing, Q. Bao, P. Wang, X. Tang, and D. Wu, â€œCharacterizing AI Model Inference Applications Running in the SGX Environment,â€
in 2021 IEEE International Conference on Networking, Architecture and Storage (NAS). IEEE, oct 2021, pp. 1â€“4,
https://doi.org/10.1109/NAS51552.2021.9605445.
[26] J. Hao, P. Subedi, I. K. Kim, and L. Ramaswamy, â€œCharacterizing Resource Heterogeneity in Edge Devices for Deep Learning
Inferences,â€ in Proceedings of the 2021 on Systems and Network Telemetry and Analytics. New York: ACM, jun 2020, pp.
21â€“24, https://doi.org/10.1145/3452411.3464446.
[27] N. James, L.-Y. Ong, and M.-C. Leow, â€œExploring Distributed Deep Learning Inference Using Raspberry Pi Spark Cluster,â€
Future Internet, vol. 14, no. 8, p. 220, jul 2022, https://doi.org/10.3390/fi14080220.
[28] T. Aboneh, A. Rorissa, R. Srinivasagan, and A. Gemechu, â€œComputer Vision Framework for Wheat Disease Identification
and Classification Using Jetson GPU Infrastructure,â€ Technologies, vol. 9, no. 3, p. 47, jul 2021, https://doi.org/10.3390/
technologies9030047.
[29] M. A. Wakili, H. A. Shehu, M. H. Sharif, M. H. U. Sharif, A. Umar, H. Kusetogullari, I. F. Ince, and S. Uyaver, â€œClassification
of Breast Cancer Histopathological Images Using DenseNet and Transfer Learning,â€ Computational Intelligence and
Neuroscience, vol. 2022, pp. 1â€“31, oct 2022, https://doi.org/10.1155/2022/8904768.
[30] J. Lee, M. Yu, Y. Kwon, and T. Kim, â€œQuantune: Post-training quantization of convolutional neural networks using extreme
gradient boosting for fast deployment,â€ Future Generation Computer Systems, vol. 132, pp. 124â€“135, jul 2022, https://doi.org/
10.1016/j.future.2022.02.005.
[31] Y. Nahshan, B. Chmiel, C. Baskin, E. Zheltonozhskii, R. Banner, A. M. Bronstein, and A. Mendelson, â€œLoss aware post-training
quantization,â€ Machine Learning, vol. 110, no. 11-12, pp. 3245â€“3262, dec 2021, https://doi.org/10.1007/s10994-021-06053-z.
[32] Â´ E. T. Morais, G. A. Barberes, I. V. A. F. Souza, F. G. Leal, J. V. P. Guzzo, and A. L. D. Spigolon, â€œPearson Correlation Coefficient
Applied to Petroleum System Characterization: The Case Study of Potiguar and Reconcavo Basins, Brazil,â€ Geosciences,
vol. 13, no. 9, p. 282, sep 2023, https://doi.org/10.3390/geosciences13090282.

Characterizing Hardware Utilization on Edge Devices when Inferring Compressed Deep Learning Models

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

How to Cite

Similar Articles

Most read articles by the same author(s)

menubaru

tools

citation