K-Means-Based Customer Segmentation with Domain-Specific FeatureEngineering forWater Payment Arrears Management

Authors

  • Andi Hary Akbar Universitas Mataram, Mataram, Indonesia
  • Heri Wijayanto Universitas Mataram, Mataram, Indonesia
  • I Wayan Agus Arimbawa Universitas Mataram, Mataram, Indonesia

DOI:

https://doi.org/10.30812/matrik.v25i1.5186

Keywords:

Customer segmentation, Feature engineering, K-means clustering, Payment behavior analysis, Utility analytics

Abstract

Indonesian water utilities face persistent challenges in managing payment delinquencies due to diverse customer characteristics, geographic limitations, and inadequate analytical capabilities. Addressing this issue is essential to optimizing revenue collection and supporting sustainable operations. This study aims to develop a data-driven customer segmentation framework using K-means clustering to enhance delinquency management. The framework incorporates six engineered features—Debt Efficiency, Payment Behavior Score, Category Risk Score, Geographic Risk Score, Consumption Intensity, and Financial Risk Score—designed to capture customer payment behavior, consumption patterns, and geographic risk. We applied the model to 1,500 anonymized customer records from PT Air Minum Giri Menang, focusing on those with delinquencies exceeding four months. Risk scoring was based on quintile distribution, and optimal clustering was determined through the elbow method combined with silhouette coefficient analysis. The results produced a two-cluster solution (silhouette score = 0.538), showing statistically significant differences across features (p ¡ 0.001) and medium-to-large effect sizes (Cohen’s d = 0.52–2.12). The segmentation identified medium-risk customers (86.7%) who require preventive management and high-risk customers (13.3%) who need billing intervention. Urban areas exhibited higher delinquency risk (18.4%) than rural areas (2.5%), indicating the need for geographically targeted strategies. All customer data was anonymized following Indonesian data protection protocols. In conclusion, the proposed framework transforms manual billing supervision into an adaptive, data-driven management system, contributing to segmentation research by introducing utility-specific engineered features for Indonesian water utilities.

Downloads

Download data is not yet available.

References

[1] A. Ranjan and S. Srivastava, “Customer segmentation using machine learning: A literature review,” vol. 2481, no. 1, p. 020036,

November,2022, https://doi.org/10.1063/5.0103946.

[2] T. Stylianou and A. Pantelidou, “A machine learning approach to consumer behavior in supermarket analytics,” vol. 16, p.

100600, September,2025, https://doi.org/10.1016/j.dajour.2025.100600.

[3] R. Mathumitha, P. Rathika, and K. Manimala, “Big Data Analytics and Visualization of Residential Electricity Consumption

Behavior based on Smart Meter Data,” in 2022 International Conference on Breakthrough in Heuristics And Reciprocation of

Advanced Technologies (BHARAT). IEEE, April,2022, pp. 166–171, https://doi.org/10.1109/BHARAT53139.2022.00043.

[4] Q. Wang, G. Sun, F. Lou, L. Jin, and P. Lu, “Data Analytics Enabled Power Marketing Analysis and Decision-Making Supporting

System,” in 2022 World Automation Congress (WAC). IEEE, October,2022, pp. 247–251, https://doi.org/10.23919/

WAC55640.2022.9934690.

[5] Y. Li, X. Chu, D. Tian, J. Feng, and W. Mu, “Customer segmentation using K-means clustering and the adaptive particle swarm

optimization algorithm,” vol. 113, p. 107924, December,2021, https://doi.org/10.1016/j.asoc.2021.107924.

[6] F. P. Rachman, H. Santoso, and A. Djajadi, “Machine Learning Mini Batch K-means and Business Intelligence Utilization for

Credit Card Customer Segmentation,” vol. 12, no. 10, 2021, https://doi.org/10.14569/IJACSA.2021.0121024.

[7] M. Pradana, “Maximizing Strategy Improvement in Mall Customer Segmentation using K-means Clustering,” vol. 2, no. 1,

January,2021, https://doi.org/10.47738/jads.v2i1.18.

[8] J. Salminen, M. Mustak, M. Sufyan, and B. J. Jansen, “How can algorithms help in segmenting users and customers? A

systematic review and research agenda for algorithmic customer segmentation,” vol. 11, no. 4, pp. 677–692, December,2023,

https://doi.org/10.1057/s41270-023-00235-5.

[9] M. Alves Gomes and T. Meisen, “A review on customer segmentation methods for personalized customer targeting in ecommerce

use cases,” vol. 21, no. 3, pp. 527–570, September,2023, https://doi.org/10.1007/s10257-023-00640-4.

[10] I. Daniel, N. K. Ajami, A. Castelletti, D. Savic, R. A. Stewart, and A. Cominola, “A survey of water utilities’ digital transformation:

Drivers, impacts, and enabling technologies,” vol. 6, no. 1, p. 51, July,2023, https://doi.org/10.1038/s41545-023-00265-7.

[11] S. Veltri, M. E. Bruni, G. Iazzolino, D. Morea, and G. Baldissarro, “Do ESG factors improve utilities corporate efficiency

and reduce the risk perceived by credit lending institutions? An empirical analysis,” vol. 81, p. 101520, April,2023, https:

//doi.org/10.1016/j.jup.2023.101520.

[12] M. Gul and M. A. Rehman, “Big data: An optimized approach for cluster initialization,” vol. 10, no. 1, p. 120, July,2023,

https://doi.org/10.1186/s40537-023-00798-1.

[13] K. Tabianan, S. Velu, and V. Ravi, “K-Means Clustering Approach for Intelligent Customer Segmentation Using Customer

Purchase Behavior Data,” vol. 14, no. 12, p. 7243, June,2022, https://doi.org/10.3390/su14127243.

[14] “Undang-undang (UU) Nomor 27 Tahun 2022,” Oktober,2022.

[15] A. Mumuni and F. Mumuni, “Automated data processing and feature engineering for deep learning and big data applications:

A survey,” vol. 3, no. 2, pp. 113–153, March,2025, https://doi.org/10.1016/j.jiixd.2024.01.002.

[16] g.-i. family=Zubair, given=Md., M. A. Iqbal, A. Shil, M. J. M. Chowdhury, M. A. Moni, and I. H. Sarker, “An Improved

K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling,” vol. 11, no. 5, pp. 1525–1544, October,2024,

https://doi.org/10.1007/s40745-022-00428-2.

[17] J. VanderPlas, “Python Data Science Handbook: Essential Tools for Working with Data,” 2023.

[18] A. Geron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media, Inc., October,2022.

[19] Y. AbdulRaheem, “Statistica

l Significance versus Clinical Relevance: Key Considerations in Interpretation Medical Research Data,” vol. 49, no. 6, pp.

791–795, November,2024, https://doi.org/10.4103/ijcm.ijcm 601 23.

[20] S. Panjeh, A. Nordahl-Hansen, and H. Cogo-Moreira, “Establishing new cutoffs for : application using known effect sizes from

trials for improving sleep quality on composite mental health,” vol. 32, no. 3, p. e1969, September,2023, https://doi.org/10.

1002/mpr.1969.

Downloads

Published

2025-11-21

Issue

Section

Articles

How to Cite

[1]
A. H. Akbar, H. Wijayanto, and I. W. A. Arimbawa, “K-Means-Based Customer Segmentation with Domain-Specific FeatureEngineering forWater Payment Arrears Management”, MATRIK, vol. 25, no. 1, pp. 39–52, Nov. 2025, doi: 10.30812/matrik.v25i1.5186.

Similar Articles

1-10 of 167

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)