K-Means-Based Customer Segmentation with Domain-Specific Feature Engineering for Water Payment Arrears Management
DOI:
https://doi.org/10.30812/matrik.v25i1.5186Keywords:
Customer segmentation, Feature engineering, K-means clustering, Payment behavior analysis, Utility analyticsAbstract
Indonesian water utilities face persistent challenges in managing payment delinquencies due to diverse customer characteristics, geographic limitations, and inadequate analytical capabilities. Addressing this issue is essential to optimizing revenue collection and supporting sustainable operations. This study aims to develop a data-driven customer segmentation framework using K-means clustering to enhance delinquency management. The framework incorporates six engineered features—Debt Efficiency, Payment Behavior Score, Category Risk Score, Geographic Risk Score, Consumption Intensity, and Financial Risk Score—designed to capture customer payment behavior, consumption patterns, and geographic risk. We applied the model to 1,500 anonymized customer records from PT Air Minum Giri Menang, focusing on those with delinquencies exceeding four months. Risk scoring was based on quintile distribution, and optimal clustering was determined through the elbow method combined with silhouette coefficient analysis. The results produced a two-cluster solution (silhouette score = 0.538), showing statistically significant differences across features (p ¡ 0.001) and medium-to-large effect sizes (Cohen’s d = 0.52–2.12). The segmentation identified medium-risk customers (86.7%) who require preventive management and high-risk customers (13.3%) who need billing intervention. Urban areas exhibited higher delinquency risk (18.4%) than rural areas (2.5%), indicating the need for geographically targeted strategies. All customer data was anonymized following Indonesian data protection protocols. In conclusion, the proposed framework transforms manual billing supervision into an adaptive, data-driven management system, contributing to segmentation research by introducing utility-specific engineered features for Indonesian water utilities.
Downloads
References
[1] A. Ranjan and S. Srivastava, “Customer segmentation using machine learning: A literature review,” vol. 2481, no. 1, p. 020036,
November,2022, https://doi.org/10.1063/5.0103946.
[2] T. Stylianou and A. Pantelidou, “A machine learning approach to consumer behavior in supermarket analytics,” vol. 16, p.
100600, September,2025, https://doi.org/10.1016/j.dajour.2025.100600.
[3] R. Mathumitha, P. Rathika, and K. Manimala, “Big Data Analytics and Visualization of Residential Electricity Consumption
Behavior based on Smart Meter Data,” in 2022 International Conference on Breakthrough in Heuristics And Reciprocation of
Advanced Technologies (BHARAT). IEEE, April,2022, pp. 166–171, https://doi.org/10.1109/BHARAT53139.2022.00043.
[4] Q. Wang, G. Sun, F. Lou, L. Jin, and P. Lu, “Data Analytics Enabled Power Marketing Analysis and Decision-Making Supporting
System,” in 2022 World Automation Congress (WAC). IEEE, October,2022, pp. 247–251, https://doi.org/10.23919/
WAC55640.2022.9934690.
[5] Y. Li, X. Chu, D. Tian, J. Feng, and W. Mu, “Customer segmentation using K-means clustering and the adaptive particle swarm
optimization algorithm,” vol. 113, p. 107924, December,2021, https://doi.org/10.1016/j.asoc.2021.107924.
[6] F. P. Rachman, H. Santoso, and A. Djajadi, “Machine Learning Mini Batch K-means and Business Intelligence Utilization for
Credit Card Customer Segmentation,” vol. 12, no. 10, 2021, https://doi.org/10.14569/IJACSA.2021.0121024.
[7] M. Pradana, “Maximizing Strategy Improvement in Mall Customer Segmentation using K-means Clustering,” vol. 2, no. 1,
January,2021, https://doi.org/10.47738/jads.v2i1.18.
[8] J. Salminen, M. Mustak, M. Sufyan, and B. J. Jansen, “How can algorithms help in segmenting users and customers? A
systematic review and research agenda for algorithmic customer segmentation,” vol. 11, no. 4, pp. 677–692, December,2023,
https://doi.org/10.1057/s41270-023-00235-5.
[9] M. Alves Gomes and T. Meisen, “A review on customer segmentation methods for personalized customer targeting in ecommerce
use cases,” vol. 21, no. 3, pp. 527–570, September,2023, https://doi.org/10.1007/s10257-023-00640-4.
[10] I. Daniel, N. K. Ajami, A. Castelletti, D. Savic, R. A. Stewart, and A. Cominola, “A survey of water utilities’ digital transformation:
Drivers, impacts, and enabling technologies,” vol. 6, no. 1, p. 51, July,2023, https://doi.org/10.1038/s41545-023-00265-7.
[11] S. Veltri, M. E. Bruni, G. Iazzolino, D. Morea, and G. Baldissarro, “Do ESG factors improve utilities corporate efficiency
and reduce the risk perceived by credit lending institutions? An empirical analysis,” vol. 81, p. 101520, April,2023, https:
//doi.org/10.1016/j.jup.2023.101520.
[12] M. Gul and M. A. Rehman, “Big data: An optimized approach for cluster initialization,” vol. 10, no. 1, p. 120, July,2023,
https://doi.org/10.1186/s40537-023-00798-1.
[13] K. Tabianan, S. Velu, and V. Ravi, “K-Means Clustering Approach for Intelligent Customer Segmentation Using Customer
Purchase Behavior Data,” vol. 14, no. 12, p. 7243, June,2022, https://doi.org/10.3390/su14127243.
[14] “Undang-undang (UU) Nomor 27 Tahun 2022,” Oktober,2022.
[15] A. Mumuni and F. Mumuni, “Automated data processing and feature engineering for deep learning and big data applications:
A survey,” vol. 3, no. 2, pp. 113–153, March,2025, https://doi.org/10.1016/j.jiixd.2024.01.002.
[16] g.-i. family=Zubair, given=Md., M. A. Iqbal, A. Shil, M. J. M. Chowdhury, M. A. Moni, and I. H. Sarker, “An Improved
K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling,” vol. 11, no. 5, pp. 1525–1544, October,2024,
https://doi.org/10.1007/s40745-022-00428-2.
[17] J. VanderPlas, “Python Data Science Handbook: Essential Tools for Working with Data,” 2023.
[18] A. Geron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media, Inc., October,2022.
[19] Y. AbdulRaheem, “Statistica
l Significance versus Clinical Relevance: Key Considerations in Interpretation Medical Research Data,” vol. 49, no. 6, pp.
791–795, November,2024, https://doi.org/10.4103/ijcm.ijcm 601 23.
[20] S. Panjeh, A. Nordahl-Hansen, and H. Cogo-Moreira, “Establishing new cutoffs for : application using known effect sizes from
trials for improving sleep quality on composite mental health,” vol. 32, no. 3, p. e1969, September,2023, https://doi.org/10.
1002/mpr.1969.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Andi Hary Akbar, Heri Wijayanto, I Wayan Agus Arimbawa

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
How to Cite
Similar Articles
- Baiq Rima Mozarita Erdiani, Aryo Yudo Husodo, Ida Bagus Ketut Widiartha, Novel Application of K-Means Algorithm for Unique Sentiment Clustering in 2024 Korean Movie Reviews on TikTok Platform , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 2 (2025)
- Ahmad Zein Al Wafi, Febry Putra Rochim, Veda Bezaleel, Investigating Liver Disease Machine Learning Prediction Performancethrough Various Feature Selection Methods , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 3 (2025)
- Yully Sofyah Waode, Anang Kurnia, Yenni Angraini, K-Means Optimization Algorithm to Improve Cluster Quality on Sparse Data , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
- Tb Ai Munandar, Ajif Yunizar Yusuf Pratama, Regional Clustering Based on Types of Non-Communicable Diseases Using k-Means Algorithm , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Nella Rosa Sudianjaya, Chastine Fatichah, Segmentation and Classification of Breast Cancer Histopathological Image Utilizing U-Net and Transfer Learning ResNet50 , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
- Lalu Zazuli Azhar Mardedi, Ariyanto Ariyanto, Analisa Kinerja System Gluster FS pada Proxmox VE untuk Menyediakan High Availability , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 1 (2019)
- Christopel H. Simanjuntak, Musfiah Musfiah, Muhammad Bahit, Cristovani W. Lohonauman, Stenly B. Dodie, Khamla Nonalinsavath, Middleware Development for Heterogeneous Databases on Multi-Architecture Systems Small Medium Enterprise , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 25 No. 1 (2025)
- Arief Herdiansah, Sistem Pendukung Keputusan Referensi Pemilihan Tujuan Jurusan Teknik di Perguruan Tinggi Bagi Siswa Kelas XII IPA Mengunakan Metode AHP , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 2 (2020)
- Mochamad Wahyudi, Solikhun Solikhun, Lise Pujiastuti, Gerhard-Wilhelm Weber, New Approach K-Medoids Clustering Based on Chebyshev Distance with Quantum Computing for Anemia Prediction , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 25 No. 1 (2025)
- Muhammad Zaki Wiryawan, Didik Dwi Prasetya, Anik Nur Handayani, Tsukasa Hirashima, Wahyu Styo Pratama, Lalu Ganda Rady Putra, Enhancing Semantic Similarity in Concept Maps Using LargeLanguage Models , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 3 (2025)
You may also start an advanced similarity search for this article.
Most read articles by the same author(s)
- Ahmad Fatoni Dwi Putra, Muhamad Nizam Azmi, Heri Wijayanto, Satria Utama, I Gede Putu Wirarama Wedashwara Wirawan, Optimizing Rain Prediction Model Using Random Forest and Grid Search Cross-Validation for Agriculture Sector , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
- Pradita Dwi Rahman, Heri Wijayanto, Royana Afwani, Wirarama Wesdawara, Ahmad Zafrullah Mardiansyah, Blockchain-Based TraditionalWeaving Certification and Elliptic Curve Digital Signature , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
.png)











