K-Means-Based Customer Segmentation with Domain-Specific FeatureEngineering forWater Payment Arrears Management
DOI:
https://doi.org/10.30812/matrik.v25i1.5186Keywords:
Customer segmentation, Feature engineering, K-means clustering, Payment behavior analysis, Utility analyticsAbstract
Indonesian water utilities face persistent challenges in managing payment delinquencies due to diverse customer characteristics, geographic limitations, and inadequate analytical capabilities. Addressing this issue is essential to optimizing revenue collection and supporting sustainable operations. This study aims to develop a data-driven customer segmentation framework using K-means clustering to enhance delinquency management. The framework incorporates six engineered features—Debt Efficiency, Payment Behavior Score, Category Risk Score, Geographic Risk Score, Consumption Intensity, and Financial Risk Score—designed to capture customer payment behavior, consumption patterns, and geographic risk. We applied the model to 1,500 anonymized customer records from PT Air Minum Giri Menang, focusing on those with delinquencies exceeding four months. Risk scoring was based on quintile distribution, and optimal clustering was determined through the elbow method combined with silhouette coefficient analysis. The results produced a two-cluster solution (silhouette score = 0.538), showing statistically significant differences across features (p ¡ 0.001) and medium-to-large effect sizes (Cohen’s d = 0.52–2.12). The segmentation identified medium-risk customers (86.7%) who require preventive management and high-risk customers (13.3%) who need billing intervention. Urban areas exhibited higher delinquency risk (18.4%) than rural areas (2.5%), indicating the need for geographically targeted strategies. All customer data was anonymized following Indonesian data protection protocols. In conclusion, the proposed framework transforms manual billing supervision into an adaptive, data-driven management system, contributing to segmentation research by introducing utility-specific engineered features for Indonesian water utilities.
Downloads
References
[1] A. Ranjan and S. Srivastava, “Customer segmentation using machine learning: A literature review,” vol. 2481, no. 1, p. 020036,
November,2022, https://doi.org/10.1063/5.0103946.
[2] T. Stylianou and A. Pantelidou, “A machine learning approach to consumer behavior in supermarket analytics,” vol. 16, p.
100600, September,2025, https://doi.org/10.1016/j.dajour.2025.100600.
[3] R. Mathumitha, P. Rathika, and K. Manimala, “Big Data Analytics and Visualization of Residential Electricity Consumption
Behavior based on Smart Meter Data,” in 2022 International Conference on Breakthrough in Heuristics And Reciprocation of
Advanced Technologies (BHARAT). IEEE, April,2022, pp. 166–171, https://doi.org/10.1109/BHARAT53139.2022.00043.
[4] Q. Wang, G. Sun, F. Lou, L. Jin, and P. Lu, “Data Analytics Enabled Power Marketing Analysis and Decision-Making Supporting
System,” in 2022 World Automation Congress (WAC). IEEE, October,2022, pp. 247–251, https://doi.org/10.23919/
WAC55640.2022.9934690.
[5] Y. Li, X. Chu, D. Tian, J. Feng, and W. Mu, “Customer segmentation using K-means clustering and the adaptive particle swarm
optimization algorithm,” vol. 113, p. 107924, December,2021, https://doi.org/10.1016/j.asoc.2021.107924.
[6] F. P. Rachman, H. Santoso, and A. Djajadi, “Machine Learning Mini Batch K-means and Business Intelligence Utilization for
Credit Card Customer Segmentation,” vol. 12, no. 10, 2021, https://doi.org/10.14569/IJACSA.2021.0121024.
[7] M. Pradana, “Maximizing Strategy Improvement in Mall Customer Segmentation using K-means Clustering,” vol. 2, no. 1,
January,2021, https://doi.org/10.47738/jads.v2i1.18.
[8] J. Salminen, M. Mustak, M. Sufyan, and B. J. Jansen, “How can algorithms help in segmenting users and customers? A
systematic review and research agenda for algorithmic customer segmentation,” vol. 11, no. 4, pp. 677–692, December,2023,
https://doi.org/10.1057/s41270-023-00235-5.
[9] M. Alves Gomes and T. Meisen, “A review on customer segmentation methods for personalized customer targeting in ecommerce
use cases,” vol. 21, no. 3, pp. 527–570, September,2023, https://doi.org/10.1007/s10257-023-00640-4.
[10] I. Daniel, N. K. Ajami, A. Castelletti, D. Savic, R. A. Stewart, and A. Cominola, “A survey of water utilities’ digital transformation:
Drivers, impacts, and enabling technologies,” vol. 6, no. 1, p. 51, July,2023, https://doi.org/10.1038/s41545-023-00265-7.
[11] S. Veltri, M. E. Bruni, G. Iazzolino, D. Morea, and G. Baldissarro, “Do ESG factors improve utilities corporate efficiency
and reduce the risk perceived by credit lending institutions? An empirical analysis,” vol. 81, p. 101520, April,2023, https:
//doi.org/10.1016/j.jup.2023.101520.
[12] M. Gul and M. A. Rehman, “Big data: An optimized approach for cluster initialization,” vol. 10, no. 1, p. 120, July,2023,
https://doi.org/10.1186/s40537-023-00798-1.
[13] K. Tabianan, S. Velu, and V. Ravi, “K-Means Clustering Approach for Intelligent Customer Segmentation Using Customer
Purchase Behavior Data,” vol. 14, no. 12, p. 7243, June,2022, https://doi.org/10.3390/su14127243.
[14] “Undang-undang (UU) Nomor 27 Tahun 2022,” Oktober,2022.
[15] A. Mumuni and F. Mumuni, “Automated data processing and feature engineering for deep learning and big data applications:
A survey,” vol. 3, no. 2, pp. 113–153, March,2025, https://doi.org/10.1016/j.jiixd.2024.01.002.
[16] g.-i. family=Zubair, given=Md., M. A. Iqbal, A. Shil, M. J. M. Chowdhury, M. A. Moni, and I. H. Sarker, “An Improved
K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling,” vol. 11, no. 5, pp. 1525–1544, October,2024,
https://doi.org/10.1007/s40745-022-00428-2.
[17] J. VanderPlas, “Python Data Science Handbook: Essential Tools for Working with Data,” 2023.
[18] A. Geron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media, Inc., October,2022.
[19] Y. AbdulRaheem, “Statistica
l Significance versus Clinical Relevance: Key Considerations in Interpretation Medical Research Data,” vol. 49, no. 6, pp.
791–795, November,2024, https://doi.org/10.4103/ijcm.ijcm 601 23.
[20] S. Panjeh, A. Nordahl-Hansen, and H. Cogo-Moreira, “Establishing new cutoffs for : application using known effect sizes from
trials for improving sleep quality on composite mental health,” vol. 32, no. 3, p. e1969, September,2023, https://doi.org/10.
1002/mpr.1969.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Andi Hary Akbar, Heri Wijayanto, I Wayan Agus Arimbawa

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
How to Cite
Similar Articles
- Paska Marto Hasugian, Devy Mathelinea, Siska Simamora, Pandi Barita Nauli Simangunsong, Comparative Evaluation of Data Clustering Accuracy through Integration of Dimensionality Reduction and Distance Metric , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 3 (2025)
- Suhirman Suhirman, Shoffan Saifullah, Ahmad Tri Hidayat, Rr Hajar Puji Sejati, Otsu Method for Chicken Egg Embryo Detection based-on Increase Image Quality , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 2 (2022)
- Indradi Rahmatullah, Gibran Satya Nugraha, Arik Aranta, Feature Selection on Grouping Students Into Lab Specializations for the Final Project Using Fuzzy C-Means , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
- Anas Syaifudin, Purwanto Purwanto, Heribertus Himawan, M. Arief Soeleman, Customer Segmentation with RFM Model using Fuzzy C-Means and Genetic Programming , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 2 (2023)
- Neny Sulistianingsih, Galih Hendro Martono, Enhancing Predictive Models: An In-depth Analysis of Feature Selection Techniques Coupled with Boosting Algorithms , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Samuel Manurung, Mufria J. Purba, Pemanfaatan Aplikasi Customer Relation Management pada koperasi , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 1 (2019)
- Putri Jafar, Dolly Indra, Fitriyani Umar, Color Feature Extraction for Grape Variety Identification: Naïve Bayes Approach , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
- Bambang Krismono Triwijoyo, SEGMENTASI CITRA PEMBULUH DARAH RETINA MENGGUNAKAN METODE DETEKSI GARIS MULTI SKALA , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 15 No. 1 (2015)
- Zilvanhisna Emka Fitri, Lalitya Nindita Sahenda, Sulton Mubarok, Abdul Madjid, Arizal Mujibtamala Nanda Imron, Implementing K-Nearest Neighbor to Classify Wild Plant Leaf as a Medicinal Plants , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
- Baiq Rima Mozarita Erdiani, Aryo Yudo Husodo, Ida Bagus Ketut Widiartha, Novel Application of K-Means Algorithm for Unique Sentiment Clustering in 2024 Korean Movie Reviews on TikTok Platform , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 2 (2025)
You may also start an advanced similarity search for this article.
Most read articles by the same author(s)
- Ahmad Fatoni Dwi Putra, Muhamad Nizam Azmi, Heri Wijayanto, Satria Utama, I Gede Putu Wirarama Wedashwara Wirawan, Optimizing Rain Prediction Model Using Random Forest and Grid Search Cross-Validation for Agriculture Sector , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
- Pradita Dwi Rahman, Heri Wijayanto, Royana Afwani, Wirarama Wesdawara, Ahmad Zafrullah Mardiansyah, Blockchain-Based TraditionalWeaving Certification and Elliptic Curve Digital Signature , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
.png)











