Enhancement of Supervised Learning Models for Intrusion Detection Through Mutual Information and Hyperparameter Tuning

Deny Jollyta; Yoakhina Nicole Makaruku; Alyauma Hajjah; Yulvia Nora Marlim

doi:10.30812/matrik.v25i2.5760

Authors

Deny Jollyta Institut Bisnis dan Teknologi Pelita Indonesia, Pekanbaru, Indonesia
Yoakhina Nicole Makaruku Institut Agama Kristen Negeri, Ambon, Indonesia
Alyauma Hajjah Institut Bisnis dan Teknologi Pelita Indonesia, Pekanbaru, Indonesia
Yulvia Nora Marlim Institut Bisnis dan Teknologi Pelita Indonesia, Pekanbaru, Indonesia

DOI:

https://doi.org/10.30812/matrik.v25i2.5760

Keywords:

Hyperparameter Tuning, Intrusion Detection, Mutual information, Supervised learning

Abstract

Enhancing the performance of supervised learning algorithms through feature and hyperparameter testing remains challenging for users, particularly when detecting computer network intrusions. There are opportunities to assess whether a supervised learning algorithm performs optimally, depending on the number of features and the choice of hyperparameters. The purpose of this research is to enhance the network intrusion detection performance of three supervised learning algorithms, namely Support Vector Machine (SVM), eXtreme Gradient Boosting, and Random Forest, by using the Mutual Information feature selection approach and hyperparameter tuning. Mutual Information measures the dependency of features on the target. Features with high values are the most informative. Hyperparameters are not learned from the data; they are set before training begins. Hyperparameters are selected in accordance with the requirements of the three algorithms via iterative training and testing on the NSL-KDD dataset. The dataset was split into 80:20, 70:30, and 60:40. The results showed that the fifteen features with the highest mutual information were identified and trained on the data using appropriate hyperparameters. By splitting the data in an 80:20 ratio, the accuracy of Support Vector Machine reached its maximum, increasing from 90% to 98%. In contrast, eXtreme Gradient Boosting and Random Forest reached their maximum, increasing from 97% and 98% to 100%, respectively. The study’s findings advance our understanding of how algorithm performance depends on feature and hyperparameter selection.

Downloads

Download data is not yet available.

Author Biographies

Deny Jollyta, Institut Bisnis dan Teknologi Pelita Indonesia, Pekanbaru, Indonesia

Lecture Faculty of Computer Science and Informatics Institut Bisnis dan Teknologi Pelita Indonesia, Pekanbaru, Indonesia
Yoakhina Nicole Makaruku, Institut Agama Kristen Negeri, Ambon, Indonesia

Lecturer Institut Agama Kristen Negeri, Ambon, Indonesia
Alyauma Hajjah, Institut Bisnis dan Teknologi Pelita Indonesia, Pekanbaru, Indonesia

Lecturer Institut Bisnis dan Teknologi Pelita Indonesia
Yulvia Nora Marlim, Institut Bisnis dan Teknologi Pelita Indonesia, Pekanbaru, Indonesia

Lecturer Institut Bisnis dan Teknologi Pelita Indonesia

References

[1] W. Tang and Y. Liu, “University mobile employment network information system in the internet age,” Journal of Physics:

Conference Series, vol. 1881, no. 2, p. 022095, Apr. 2021, https://doi.org/10.1088/1742-6596/1881/2/022095.

[2] S. Rysbekov, A. Aitbanov, Z. Abdiakhmetova, and A. Kartbayev, “Advancing network security: A comparative research of

machine learning techniques for intrusion detection,” International Journal of Electrical and Computer Engineering (IJECE),

vol. 15, no. 2, p. 2271, Apr. 2025, https://doi.org/10.11591/ijece.v15i2.pp2271-2281.

[3] Z. Ahmad, A. Shahid Khan, C. Wai Shiang, J. Abdullah, and F. Ahmad, “Network intrusion detection system: A systematic

study of machine learning and deep learning approaches,” Transactions on Emerging Telecommunications Technologies, vol. 32,

no. 1, p. e4150, Jan. 2021, https://doi.org/10.1002/ett.4150.

[4] M. Kaif, P. P, and L. V, “A study on network intrusion detection system,” International Journal For Multidisciplinary Research,

vol. 6, no. 3, p. 20214, Jun. 2024, https://doi.org/10.36948/ijfmr.2024.v06i03.20214.

[5] Y. Zhang, “Fwa-svm network intrusion identification technology for network security,” IEEE Access, vol. 13, pp. 18 579–18 593,

January, 2025, https://doi.org/10.1109/ACCESS.2025.3532619.

[6] K. M. Abuali, L. Nissirat, and A. Al-Samawi, “Intrusion detection techniques in social media cloud:review and future directions,”

Wireless Communications and Mobile Computing, vol. 2023, pp. 1–25, Apr. 2023, https://doi.org/10.1155/2023/

6687023.

[7] K. A. Binsaeed and A. M. Hafez, “Enhancing intrusion detection systems with xgboost feature selection and deep learning

approaches,” International Journal of Advanced Computer Science and Applications, vol. 14, no. 5, 2023, https://doi.org/10.

14569/IJACSA.2023.01405112.

[8] H. S. Neto, W. S. Lacerda, and R. V. Francozo, “Random forests for online intrusion detection in computer networks,” Journal

of Computer Science, vol. 17, no. 10, pp. 905–914, Oct. 2021, https://doi.org/10.3844/jcssp.2021.905.914.

[9] P. V. Chavan and N. V. Alone, “Optimizing intrusion detection with random forest:a high-accuracy approach using cic-ids 2017,”

International Journal of Computer Applications, vol. 187, no. 3, pp. 17–22, May 2025, https://doi.org/10.5120/ijca2025924816.

[10] S. A. Ajagbe, J. B. Awotunde, and H. Florez, “Intrusion detection:a comparison study of machine learning models using

unbalanced dataset,” SN Computer Science, vol. 5, no. 8, p. 1028, Nov. 2024, https://doi.org/10.1007/s42979-024-03369-0.

[11] A. Dhindsa, S. Bhatia, S. Agrawal, and B. S. Sohi, “An improvised machine learning model based on mutual information feature

selection approach for microbes classification,” Entropy, vol. 23, no. 2, p. 257, Feb. 2021, https://doi.org/10.3390/e23020257.

[12] M. Hassan and N. Kaabouch, “Impact of feature selection techniques on the performance of machine learning models for depression

detection using eeg data,” Applied Sciences, vol. 14, no. 22, p. 10532, Nov. 2024, https://doi.org/10.3390/app142210532.

[13] A. Alsahaf, N. Petkov, V. Shenoy, and G. Azzopardi, “A framework for feature selection through boosting,” Expert Systems

with Applications, vol. 187, p. 115895, Jan. 2022, https://doi.org/10.1016/j.eswa.2021.115895.

[14] L. Ragha and H. S. Deshpande, “A hybrid random forest-based feature selection model using mutual information and f-score

for preterm birth classification,” International Journal of Medical Engineering and Informatics, vol. 15, no. 1, p. 1, 2023,

https://doi.org/10.1504/IJMEI.2023.10051207.

[15] C. Arnold, L. Biedebach, A. Kupfer, and M. Neunhoeffer, “The role of hyperparameters in machine learning models and how

to tune them,” Political Science Research and Methods, vol. 12, no. 4, pp. 841–848, Oct. 2024, https://doi.org/10.1017/psrm.

2023.61.

[16] M. A. K. Raiaan, S. Sakib, N. M. Fahad, A. A. Mamun, M. A. Rahman, S. Shatabda, and M. S. H. Mukta, “A systematic review

of hyperparameter optimization techniques in convolutional neural networks,” Decision Analytics Journal, vol. 11, p. 100470,

Jun. 2024, https://doi.org/10.1016/j.dajour.2024.100470.

[17] J. A Ilemobayo, O. Durodola, O. Alade, O. J Awotunde, A. T Olanrewaju, O. Falana, A. Ogungbire, A. Osinuga, D. Ogunbiyi,

A. Ifeanyi, I. E Odezuligbo, and O. E Edu, “Hyperparameter tuning in machine learning:a comprehensive review,” Journal of

Engineering Research and Reports, vol. 26, no. 6, pp. 388–395, Jun. 2024, https://doi.org/10.9734/jerr/2024/v26i61188.

[18] H. Tariq, M. Majeed, and M. Ahmad, “Optimizing svm performance through combinatorial hyperparameter tuning and model

selection,” International Journal Bioautomation, vol. 29, no. 2, pp. 117–144, Jun. 2025, https://doi.org/10.7546/ijba.2025.29.2.

000981.

[19] “Machine learning-based network anomaly detection:design.”

[20] M. Das Nath and T. Bhattasali, “Anomaly Detection Using Machine Learning Approaches,” Azerbaijan Journal of High Performance

Computing, vol. 3, no. 2, pp. 196–206, Dec. 2020, https://doi.org/10.32010/26166127.2020.3.2.196.206.

[21] T. A. Deepak, “Xgboost classification based network intrusion detection system for big data using pysparkling water,” International

Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 1, pp. 377–382, Feb. 2020,

https://doi.org/10.30534/ijatcse/2020/55912020.

[22] Z. Arif Ali, Z. H. Abduljabbar, H. A. Tahir, A. Bibo Sallow, and S. M. Almufti, “extreme gradient boosting algorithm with

machine learning: A review,” Academic Journal of Nawroz University, vol. 12, no. 2, pp. 320–334, May 2023, https://doi.org/

10.25007/ajnu.v12n2a1612.

[23] E. Ismanto, J. Al Amien, and V. Vitriani, “A comparison of enhanced ensemble learning techniques for internet of things network

attack detection,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 23, no. 3, pp. 543–556, Jun.

2024, https://doi.org/10.30812/matrik.v23i3.3885.

[24] W. Li, “Optimization and application of random forest algorithm for applied mathematics specialty,” Security and Communication

Networks, vol. 2022, pp. 1–9, May 2022, https://doi.org/10.1155/2022/1131994.

[25] M. Savargiv, B. Masoumi, and M. R. Keyvanpour, “A new random forest algorithm based on learning automata,” Computational

Intelligence and Neuroscience, vol. 2021, no. 1, p. 5572781, Jan. 2021, https://doi.org/10.1155/2021/5572781.

[26] I. Ahmad and H. S. A. Qahtani, “A comparative analysis of gradient boosting, random forest and deep neural networks in

intrusion detection system,” ARPN Journal of Engineering and Applied Sciences, vol. 8, no. 12, pp. 1392–1402, aug 2023,

https://doi.org/10.59018/0623177.

[27] M. T. Abdelaziz, A. Radwan, H. Mamdouh, A. S. Saad, A. S. Abuzaid, A. A. AbdElhakeem, S. Zakzouk, K. Moussa, and M. S.

Darweesh, “Enhancing network threat detection with random forest-based nids and permutation feature importance,” Journal

of Network and Systems Management, vol. 33, no. 1, p. 2, Jan. 2025, https://doi.org/10.1007/s10922-024-09874-0.

[28] C. V. Priscilla and D. P. Prabha, “A two-phase feature selection technique using mutual information and XGB-RFE for credit

card fraud detection,” International Journal of Advanced Technology and Engineering Exploration, vol. 8, no. 85, pp. 1656–

1668, Dec. 2021, https://doi.org/10.19101/IJATEE.2021.874615.

[29] F. Aghamohammadi and F. Shakeri, “The critical role of hyperparameter tuning in machine learning: A focus on the svd

method for matrix completion,” International Journal of Computer Applications, vol. 187, no. 24, pp. 1–6, Jul. 2025, https:

//doi.org/10.5120/ijca2025925371.

[30] F. S. Nahm, “Receiver operating characteristic curve: Overview and practical use for clinicians,” Korean Journal of Anesthesiology,

vol. 75, no. 1, pp. 25–36, Feb. 2022, https://doi.org/10.4097/kja.21209.

[31] S. K. Corbaci ouglu and G. Aksel, “Receiver operating characteristic curve analysis in diagnostic accuracy studies: A guide to

interpreting the area under the curve value,” Turkish Journal of Emergency Medicine, vol. 23, no. 4, pp. 195–198, Oct. 2023,

https://doi.org/10.4103/tjem.tjem 182 23.

Enhancement of Supervised Learning Models for Intrusion Detection Through Mutual Information and Hyperparameter Tuning

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Most read articles by the same author(s)

sidebar menu 2

tools

citation