Addressing Class Imbalance in Android Backdoor Malware DetectionUsing Ensemble Models

Rama Aria Megantara; Dewi Pergiwati; Farrikh Alzami; Ricardus Anggi Pramunendar; Dwi Puji Prabowo; Muhammad Naufal; Rivaldo Mersis Brilianto

doi:10.30812/matrik.v15i2.6198

Authors

Rama Aria Megantara Universitas Dian Nuswantoro, Semarang, Indonesia
Dewi Pergiwati Universitas Dian Nuswantoro, Semarang, Indonesia
Farrikh Alzami Universitas Dian Nuswantoro, Semarang, Indonesia
Ricardus Anggi Pramunendar Universitas Dian Nuswantoro, Semarang, Indonesia
Dwi Puji Prabowo Universitas Dian Nuswantoro, Semarang, Indonesia
Muhammad Naufal Universitas Dian Nuswantoro, Semarang, Indonesia
Rivaldo Mersis Brilianto Pusan National University, Busan, Republic of Korea

DOI:

https://doi.org/10.30812/matrik.v15i2.6198

Keywords:

Android Malware Detection, Backdoor Classification, Class Imbalance, Static Analysis, Random Forest

Abstract

Backdoor malware represents one of the most critical threats in the Android ecosystem due to its capability to enable covert remote access, escalate privileges, and exfiltrate sensitive data without user awareness. Although the CCCS-CIC-AndMal-2020 dataset is publicly available, prior studies have not specifically formulated Backdoor detection as a binary classification problem under extreme class imbalance, nor systematically evaluated the impact of oversampling and cost-sensitive weighting using imbalance-aware performance metrics. This study proposes a comprehensive detection pipeline that integrates ensemble learning, class imbalance handling strategies, and explainability-based analysis to extract behavioral signatures of Backdoor malware. A two-stage feature selection process is employed to reduce the original 9,502-dimensional feature space to 500 informative features. Subsequently, five classification algorithms are evaluated under three imbalance-handling scenarios using a composite ranking criterion based on F1-score, Area Under the Receiver Operating Characteristic Curve (AUC), Geometric Mean (G-Mean), and Matthews Correlation Coefficient (MCC). The experimental results demonstrate that the Random Forest model combined with Synthetic Minority Oversampling Technique (SMOTE) achieves the best performance, with an F1-score of 0.9043, AUC of 0.9909, G-Mean of 0.9422, and MCC of 0.8948. Furthermore, SHAP analysis identifies 39 Android permissions related to account access, covert communication, and privilege escalation as key behavioral signatures, with the permissions feature group contributing 2.31 times higher discriminative importance than nonpermission features. These findings indicate that interpretable ensemble learning not only improves detection performance but also provides actionable insights for static malware analysis.

Downloads

Download data is not yet available.

References

[1] D. Zhang, X. Yang, S. Liu, Y. Zhou, J. Fu, and G. Peng, “A survey on Android dynamic evasive malware: Taxonomy, countermeasures

and open challenges,” Computers & Security, vol. 159, p. 104646, Dec. 2025, https://doi.org/10.1016/j.cose.2025.

104646.

[2] D. S. Keyes, B. Li, G. Kaur, A. H. Lashkari, F. Gagnon, and F. Massicotte, “EntropLyzer: Android Malware Classification

and Characterization Using Entropy Analysis of Dynamic Characteristics,” in 2021 Reconciling Data Analytics, Automation,Privacy, and Security: A Big Data Challenge (RDAAPS). Hamilton, ON, Canada: IEEE, May 2021, pp. 1–12, https://doi.org/

10.1109/RDAAPS48126.2021.9452002.

[3] A. Ghourabi, “An Attention-Based Approach to Enhance the Detection and Classification of Android Malware,” Computers,

Materials & Continua, vol. 80, no. 2, pp. 2743–2760, 2024, https://doi.org/10.32604/cmc.2024.053163.

[4] Y. Qu, H. Ma, C. Zheng, Y. Jiang, and W. Wang, “A malware traffic detection method based on Victim-Attacker interaction

patterns,” Computers & Security, vol. 155, p. 104487, Aug. 2025, https://doi.org/10.1016/j.cose.2025.104487.

[5] E. Chatzoglou and G. Kambourakis, “C3: Leveraging the Native Messaging Application Programming Interface for Covert

Command and Control,” Future Internet, vol. 17, no. 4, p. 172, Apr. 2025, https://doi.org/10.3390/fi17040172.

[6] H.-j. Zhu, Y. Li, L.-m. Wang, and V. S. Sheng, “A multi-model ensemble learning framework for imbalanced android malware

detection,” Expert Systems with Applications, vol. 234, p. 120952, Dec. 2023, https://doi.org/10.1016/j.eswa.2023.120952.

[7] J. Xie, S. Li, X. Yun, C. Si, and T. Yin, “Sample analysis and multi-label classification for malicious sample datasets,” Computer

Networks, vol. 258, p. 110999, Feb. 2025, https://doi.org/10.1016/j.comnet.2024.110999.

[8] M. M. Khan, A. Buriro, T. Ahmad, and S. Ullah, “Backdoor Malware Detection in Industrial IoT Using Machine Learning,”

Computers, Materials & Continua, vol. 81, no. 3, pp. 4691–4705, 2024, https://doi.org/10.32604/cmc.2024.057648.

[9] A. Alsraratee and A. Al-Azawei, “Classifying Android Malware Categories Based on Dynamic Features: An Integration of

Feature Reduction and Selection Techniques,” Kufa Journal of Engineering, vol. 16, no. 2, pp. 96–118, Apr. 2025, https:

//doi.org/10.30572/2018/KJE/160206.

[10] S. Zhou, H. Li, X. Fu, D. Han, and X. He, “Novel Multi-Classification Dynamic Detection Model for Android Malware Based

on Improved Zebra Optimization Algorithm and LightGBM,” Sensors, vol. 24, no. 18, p. 5975, Sep. 2024, https://doi.org/10.

3390/s24185975.

[11] S. I. Imtiaz, S. U. Rehman, A. R. Javed, Z. Jalil, X. Liu, and W. S. Alnumay, “DeepAMD: Detection and identification of

Android malware using high-efficient Deep Artificial Neural Network,” Future Generation Computer Systems, vol. 115, pp.

844–856, Feb. 2021, https://doi.org/10.1016/j.future.2020.10.008.

[12] P. Pudke and U. Bansal, “A Stacked Ensemble Framework for Android Malware Detection Using Semantic Permission Aggregation

and Explainable AI,” Security and Privacy, vol. 8, no. 6, p. e70110, Nov. 2025, https://doi.org/10.1002/spy2.70110.

[13] N. H. Saeed, A. A. Hamza, M. A. Sobh, and A. M. Bahaa-Eldin, “Efficient feature ranked hybrid framework for android Iot

malware detection,” Scientific Reports, vol. 16, no. 1, p. 3726, Jan. 2026, https://doi.org/10.1038/s41598-026-35238-6.

[14] J. Li, J. He, W. Li, W. Fang, G. Yang, and T. Li, “SynDroid: An adaptive enhanced Android malware classification method

based on CTGAN-SVM,” Computers & Security, vol. 137, p. 103604, Feb. 2024, https://doi.org/10.1016/j.cose.2023.103604.

[15] X. Cheng, T. Wang, D. Zhu, and J. Ma, “Uncertainty explanation of artificial intelligence models by SHAP,” Knowledge-Based

Systems, vol. 337, p. 115437, Mar. 2026, https://doi.org/10.1016/j.knosys.2026.115437.

[16] D. Soi, A. Sanna, D. Maiorca, and G. Giacinto, “Enhancing android malware detection explainability through function call

graph APIs,” Journal of Information Security and Applications, vol. 80, p. 103691, Feb. 2024, https://doi.org/10.1016/j.jisa.

2023.103691.

[17] A. Patel and S. M. Ghosh, “EML-AMD: An explainable machine learning framework for adaptive android malware detection,”

Peer-to-Peer Networking and Applications, vol. 18, no. 5, p. 264, Sep. 2025, https://doi.org/10.1007/s12083-025-02069-7.

[18] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,”

Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, Jun. 2002, https://doi.org/10.1613/jair.953.

[19] M. Najibi and A. J. Bidgoly, “Towards a robust android malware detection model using explainable deep learning,” Journal of

Information Security and Applications, vol. 93, p. 104191, Sep. 2025, https://doi.org/10.1016/j.jisa.2025.104191.

[20] B. Alotaibi, “Multimodal Deep Learning Fusion for Accurate and Explainable Malware Family Classification,” Applied Sciences,

vol. 15, no. 21, p. 11635, Oct. 2025, https://doi.org/10.3390/app152111635.

Addressing Class Imbalance in Android Backdoor Malware DetectionUsing Ensemble Models

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

sidebar menu 2

tools

citation