Deep Learning Model Compression Techniques Performance on Edge Devices

  • Rakandhiya Daanii Rachmanto Universitas AMIKOM, Yogyakarta, Indonesia
  • Ahmad Naufal Labiib Nabhaan Universitas AMIKOM, Yogyakarta, Indonesia
  • Arief Setyanto Universitas AMIKOM ,Yogyakarta, Indonesia
Keywords: Deep Learning, Edge Devices, Model Compression

Abstract

Artificial intelligence at the edge can help solve complex tasks faced by various sectors such as automotive, healthcare and surveillance. However, challenged by the lack of computational power from the edge devices, artificial intelligence models are forced to adapt. Many have developed and quantified model compres-sion approaches over the years to tackle this problem. However, not many have considered the overhead of on-device model compression, even though model compression can take a considerable amount of time. With the added metric, we provide a more complete view on the efficiency of model compression on the edge. The objective of this research is identifying the benefit of compression methods and it’s tradeoff between size and latency reduction versus the accuracy loss as well as compression time in edge devices. In this work, quantitative method is used to analyze and rank three common ways of model compression: post-training quantization, unstructured pruning and knowledge distillation on the basis of accuracy, latency, model size and time to compress overhead. We concluded that knowledge distillation is the best, with potential of up to 11.4x model size reduction, and 78.67% latency speed up, with moderate loss of accura-cy and compression time.

Downloads

Download data is not yet available.

References

[1] D. Liu, H. Kong, X. Luo, W. Liu, and R. Subramaniam, “Bringing AI to edge: From deep learnings perspective,”
Neurocomputing, vol. 485, no. 2, pp. 297–320, May 2022, https://doi.org/10.1016/j.neucom.2021.04.141. [Online]. Available:
https://linkinghub.elsevier.com/retrieve/pii/S0925231221016428
[2] S. A. Alowais, S. S. Alghamdi, N. Alsuhebany, T. Alqahtani, A. I. Alshaya, S. N. Almohareb, A. Aldairem, M. Alrashed,
K. Bin Saleh, H. A. Badreldin, M. S. Al Yami, S. Al Harbi, and A. M. Albekairy, “Revolutionizing healthcare: the
role of artificial intelligence in clinical practice,” BMC Medical Education, vol. 23, no. 1, pp. 689–699, Sep. 2023,
https://doi.org/10.1186/s12909-023-04698-z. [Online]. Available: https://bmcmededuc.biomedcentral.com/articles/10.1186/
s12909-023-04698-z
[3] D. R. Pulimamidi and G. P. Buddha, “The Future of Healthcare: Artificial Intelligence ’s Role In Smart Hospitals AndWearable
Health Devices,” Tui Jishu/Journal of Propulsion Technology, vol. 44, no. 5, pp. 2498–2504, Dec. 2023, https://doi.org/10.
52783/tjjpt.v44.i5.2990. [Online]. Available: https://www.propulsiontechjournal.com/index.php/journal/article/view/2990
[4] J. Mendez, K. Bierzynski, M. P. Cullar, and D. P. Morales, “Edge Intelligence: Concepts, Architectures, Applications,
and Future Directions,” ACM Transactions on Embedded Computing Systems, vol. 21, no. 5, pp. 1–41, Sep. 2022,
https://doi.org/10.1145/3486674. [Online]. Available: https://dl.acm.org/doi/10.1145/3486674
[5] “Enabling automation and edge intelligence over resource constraint IoT devices for smart home,” vol. 491, no. 2.
[6] K. Muhammad, A. Ullah, J. Lloret, J. D. Ser, and V. H. C. De Albuquerque, “Deep Learning for Safe
Autonomous Driving: Current Challenges and Future Directions,” IEEE Transactions on Intelligent Transportation
Systems, vol. 22, no. 7, pp. 4316–4336, Jul. 2021, https://doi.org/10.1109/TITS.2020.3032227. [Online]. Available:
https://ieeexplore.ieee.org/document/9284628/
[7] S. Duan, D. Wang, J. Ren, F. Lyu, Y. Zhang, H. Wu, and X. Shen, “Distributed Artificial Intelligence Empowered by
End-Edge-Cloud Computing: A Survey,” IEEE Communications Surveys & Tutorials, vol. 25, no. 1, pp. 591–624, 2023,
https://doi.org/10.1109/COMST.2022.3218527. [Online]. Available: https://ieeexplore.ieee.org/document/9933792/
[8] A. A. Suzen, B. Duman, and B. Sen, “Benchmark Analysis of Jetson TX2, Jetson Nano and Raspberry PI
using Deep-CNN,” in 2020 International Congress on Human-Computer Interaction, Optimization and Robotic
Applications (HORA). Ankara, Turkey: IEEE, Jun., https://doi.org/10.1109/HORA49412.2020.9152915. [Online]. Available:
https://ieeexplore.ieee.org/document/9152915/
[9] S. K. Prashanthi, S. A. Kesanapalli, and Y. Simmhan, “Characterizing the Performance of Accelerated Jetson Edge Devices for
Training Deep Learning Models,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 6, no. 3,
pp. 1–26, Dec. 2022, https://doi.org/10.1145/3570604. [Online]. Available: https://dl.acm.org/doi/10.1145/3570604
[10] S. Deng, H. Zhao, W. Fang, J. Yin, S. Dustdar, and A. Y. Zomaya, “Edge Intelligence: The Confluence of Edge
Computing and Artificial Intelligence,” IEEE Internet of Things Journal, vol. 7, no. 8, pp. 7457–7469, Aug. 2020,
https://doi.org/10.1109/JIOT.2020.2984887. [Online]. Available: https://ieeexplore.ieee.org/document/9052677/
[11] H. Zhang, G. Wang, Z. Lei, and J.-N. Hwang, “Eye in the Sky: Drone-Based Object Tracking and 3D Localization,” in
Proceedings of the 27th ACM International Conference on Multimedia, vol. 10, no. 9. Nice France: ACM, Oct. 2020, pp.
899–907, https://doi.org/10.1145/3343031.3350933. [Online]. Available: https://dl.acm.org/doi/10.1145/3343031.3350933
[12] O. Shafi, C. Rai, R. Sen, and G. Ananthanarayanan, “Demystifying TensorRT: Characterizing Neural Network Inference
Engine on Nvidia Edge Devices,” in 2021 IEEE International Symposium on Workload Characterization (IISWC), vol. 11,
no. 2. Storrs, CT, USA: IEEE, Nov. 2021, pp. 226–237, https://doi.org/10.1109/IISWC53511.2021.00030. [Online].
Available: https://ieeexplore.ieee.org/document/9668285/
[13] N. A. Andriyanov, “Analysis of the Acceleration of Neural Networks Inference on Intel Processors Based on OpenVINO
Toolkit,” in 2020 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO).
Svetlogorsk, Russia: IEEE, Jul. 2020, pp. 1–5, https://doi.org/10.1109/SYNCHROINFO49631.2020.9166067. [Online].
Available: https://ieeexplore.ieee.org/document/9166067/
[14] K. T. Madathil, A. Dugar, N. Patil, and U. Cheramangalath, “Optimizing Machine Learning Operators and Models
for Specific Hardware Using Apache-TVM,” in 2023 14th International Conference on Computing Communication
and Networking Technologies (ICCCNT), vol. 7, no. 1. Delhi, India: IEEE, Jul. 2023, pp. 1–7, https:
//doi.org/10.1109/ICCCNT56998.2023.10306572. [Online]. Available: https://ieeexplore.ieee.org/document/10306572/
[15] K. Seshadri, B. Akin, J. Laudon, R. Narayanaswami, and A. Yazdanbakhsh, “An Evaluation of Edge TPU Accelerators for
Convolutional Neural Networks,” in 2022 IEEE International Symposium on Workload Characterization (IISWC), vol. 11,
no. 1. Austin, TX, USA: IEEE, Nov. 2022, pp. 79–91, https://doi.org/10.1109/IISWC55918.2022.00017. [Online]. Available:
https://ieeexplore.ieee.org/document/9975395/
[16] T. Choudhary, V. Mishra, A. Goswami, and J. Sarangapani, “A comprehensive survey on model compression and acceleration,”
Artificial Intelligence Review, vol. 53, no. 7, pp. 5113–5155, Oct. 2020, https://doi.org/10.1007/s10462-020-09816-7. [Online].
Available: http://link.springer.com/10.1007/s10462-020-09816-7
[17] B. Rokh, A. Azarpeyvand, and A. Khanteymoori, “A Comprehensive Survey on Model Quantization for Deep Neural
Networks in Image Classification,” ACM Transactions on Intelligent Systems and Technology, vol. 14, no. 6, pp. 1–50, Dec.
2023, https://doi.org/10.1145/3623402. [Online]. Available: https://dl.acm.org/doi/10.1145/3623402
[18] P. Hu, X. Peng, H. Zhu, M. M. S. Aly, and J. Lin, “OPQ: Compressing Deep Neural Networks with One-shot
Pruning-Quantization,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 9, pp. 7780–7788, May
2021, https://doi.org/10.1609/aaai.v35i9.16950. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/16950
[19] J. Kim, S. Chang, and N. Kwak, “PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation,” in
Interspeech 2021. ISCA, Aug. 2021, pp. 4568–4572, https://doi.org/10.21437/Interspeech.2021-248. [Online]. Available:
https://www.isca-archive.org/interspeech 2021/kim21m interspeech.html
[20] N. Aghli and E. Ribeiro, “Combining Weight Pruning and Knowledge Distillation For CNN Compression,”
in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Nashville,
TN, USA: IEEE, Jun. 2021, pp. 3185–3192, https://doi.org/10.1109/CVPRW53098.2021.00356. [Online]. Available:
https://ieeexplore.ieee.org/document/9523139/
[21] H. P. Hadi, E. H. Rachmawanto, and R. R. Ali, “Comparison of DenseNet-121 and MobileNet for Coral Reef Classification,”
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 23, no. 2, pp. 333–342, Mar. 2024,
https://doi.org/10.30812/matrik.v23i2.3683. [Online]. Available: https://journal.universitasbumigora.ac.id/index.php/matrik/
article/view/3683
[22] S. P. Baller, A. Jindal, M. Chadha, and M. Gerndt, “DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices,”
in 2021 IEEE International Conference on Cloud Engineering (IC2E). San Francisco, CA, USA: IEEE, Oct. 2021, pp. 20–30,
https://doi.org/10.1109/IC2E52221.2021.00016. [Online]. Available: https://ieeexplore.ieee.org/document/9610432/
[23] J. Azar, A. Makhoul, M. Barhamgi, and R. Couturier, “An energy efficient IoT data compression approach for edge machine
learning,” Future Generation Computer Systems, vol. 96, pp. 168–175, Jul. 2020, https://doi.org/10.1016/j.future.2019.02.005.
[Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0167739X18331716
[24] X. Jin, X. Ma, and L. Tao, “Research and analysis of pruning algorithm,” in International Conference on Algorithms,
Microchips and Network Applications, F. Cen and N. Sun, Eds. Zhuhai, China: SPIE, May 2022, pp. 70–78,
https://doi.org/10.1117/12.2636499. [Online]. Available: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/
12176/2636499/Research-and-analysis-of-pruning-algorithm/10.1117/12.2636499.full
[25] M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” in 36th International
Conference on Machine Learning, 2020, pp. 10 691–10 700, https://doi.org/10.48550/ARXIV.1905.11946. [Online]. Available:
https://arxiv.org/abs/1905.11946
[26] L. Beyer, X. Zhai, A. Royer, L. Markeeva, R. Anil, and A. Kolesnikov, “Knowledge distillation: A good teacher
is patient and consistent,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New
Orleans, LA, USA: IEEE, Jun. 2022, pp. 10 915–10 924, https://doi.org/10.1109/CVPR52688.2022.01065. [Online]. Available:
https://ieeexplore.ieee.org/document/9879513/
[27] A. Setyanto, T. B. Sasongko, M. A. Fikri, and I. K. Kim, “Near-Edge Computing Aware Object Detection: A
Review,” IEEE Access, vol. 12, pp. 2989–3011, 2024, https://doi.org/10.1109/ACCESS.2023.3347548. [Online]. Available:
https://ieeexplore.ieee.org/document/10374363/
[28] G. Huang, Z. Liu, G. Pleiss, L. V. D. Maaten, and K. Q. Weinberger, “Convolutional Networks with Dense Connectivity,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 8704–8716, Dec. 2022,
https://doi.org/10.1109/TPAMI.2019.2918284. [Online]. Available: https://ieeexplore.ieee.org/document/8721151/
[29] M. Zhang, H. Su, and J. Wen, “Classification of flower image based on attention mechanism and multi-loss attention network,”
Computer Communications, vol. 179, pp. 307–317, Nov. 2021, https://doi.org/10.1016/j.comcom.2021.09.001. [Online].
Available: https://linkinghub.elsevier.com/retrieve/pii/S0140366421003303
[30] M. Ganaie, M. Hu, A. Malik, M. Tanveer, and P. Suganthan, “Ensemble deep learning: A review,” Engineering Applications of
Artificial Intelligence, vol. 115, pp. 105–151, Oct. 2022, https://doi.org/10.1016/j.engappai.2022.105151. [Online]. Available:
https://linkinghub.elsevier.com/retrieve/pii/S095219762200269X
Published
2024-06-18
How to Cite
Rachmanto, R., Nabhaan, A., & Setyanto, A. (2024). Deep Learning Model Compression Techniques Performance on Edge Devices. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 23(3), 569-582. https://doi.org/https://doi.org/10.30812/matrik.v23i3.3961
Section
Articles