Survival Analysis with Cox Proportional Hazard Model for Tuberculosis (TBC) Patients

Survival analysis is a method in statistics which aims to analyze the relationship between time from the beginning of observation until the occurrence of an event (response variable) with factors that have an inﬂuence on the event (predictor variables). To determine the relationship between the response variable and the predictor variable, where the response variable is the time until the event occurs, one method that can be used is the cox proportional hazard regression method. The data used in this research is data on hospitalizations of tuberculosis sufferers at Haji Makassar Hospital in 2022 because it has characteristics that are in accordance with the aim of survival analysis, namely to determine the relationship between the life span of TBC patients and the factors that inﬂuence TBC disease. The results of the analysis obtained factors that signiﬁcantly inﬂuence the recovery rate of patients with TBC are shortness of breath and smoking habits. The shortness of breath variable has an inﬂuence on the recovery rate of TBC patients, namely 0,3506, which means that TBC patients who do not experiencing shortness of breath has a recovery rate of 0.3506 times the likelihood of recovery compared to patients who experience shortness of breath. Variable smoking habit was 0.7367, which means that patients with TBC did not smoking habit has a recovery rate of 0.7367 times recovered compared to patients who had a smoking habit.


A. INTRODUCTION
Survival analysis is a method in statistics that is used to analyze data from the response variable, namely the time the observation starts until an event occurs.Event are a special occurrence that can occur in an individual, such as healing, recurrence of another disease, or death.The time of an event can be expressed in years, months, weeks, or days (Fernandes et al., 2016).The purpose of the survival analysis is to determine the relationship between the time of occurrence (response variable) and the predictor variable at the time of the study.In addition, it is also used to see what factors have a significant influence on an event (Annas et al., 2018) (Poerwanto et al., 2018).The characteristic of survival data is that it often cannot be observed completely, it is called censored data.This is due to the limitations of the observations made or the individuals observed leaving the study (Turkson et al., 2021).
Finding the relationship between the response variable and the predictor variable can be done using the regression analysis method (Dewi et al., 2020;Bustan and Poerwanto, 2021).Several regression models that can be used in survival analysis include parametric models, nonparametric models, and semiparametric models.Of the three models, there is a semiparametric model that has better model flexibility because some of the variables follow a common distribution.In addition, the semiparametric model is a safe model to choose when you are in doubt about determining the parametric model and the estimator results are almost the same as the estimator results with the parametric model (Chandra and Rohmaniah, 2019).
The cox regression model is a semiparametric regression model and fulfills the proportional hazard assumption, so it is commonly known as the cox proportional hazard regression model ( (Dukalang, 2019); (Fa'rifah and Poerwanto, 2019)).This method can estimate the hazard ratio used to compare hazard values in a group of individuals with other groups of individuals to fulfill the proportional hazard assumption.This method can also estimate the regression parameters well even though the basic hazard is unknown (Fernandes et al., 2016).
Analysis survival is an analysis that has received attention in the health and medical fields that can be used to analyze the patient's survival time against a disease (Faisal et al., 2020).In this study, the data used were TBC patient data at Haji Makassar Hospital in 2022.Tuberculosis is an infectious disease caused by the Mycobacterium Tuberculosis bacterium.The germs are transmitted from TBC sufferers who talk, sneeze, or cough, splashing phlegm through the air.Nearly a quarter of the world's population is infected with Mycobacterium Tuberculosis, about 89% of TBC affects adults, and 11% affects children.Tuberculosis (TBC) is currently still a public health problem both in Indonesia and internationally so that it becomes one of the goals of sustainable health development (SDGs).Until now TBC is still the highest cause of death after HIV/AIDS and is one of the 20 main causes of death worldwide (Ministry of Health of the Republic of Indonesia, 2022).
According to WHO in the 2021 Global Tuberculosis Report, globally it is estimated that 9.9 million people will suffer from TBC in 2020.The number of deaths from tuberculosis globally in 2020 is 1.3 million, when compared to 2019 which is 1.2 million.these cases have increased.in 2020 the number of TBC cases in Indonesia will be 301 per 100,000 population.India has the third highest TBC case rate in the world after India and China (Ministry of Health of the Republic of Indonesia, 2022).. Specifically for South Sulawesi Province in 2020, the number of TBC sufferers per Regency/City was 18,863 cases, with details of 11,095 men and 7,768 women.Makassar City is the city with the highest TBC cases in South Sulawesi Province with 5,421 sufferers, followed by Gowa District with 1,810 sufferers (South Sulawesi Health Service, 2021).
Survival analysis using the Cox proportional hazards regression model has been carried out by several researchers to analyze the rate of recovery of patients from a disease.A comparison of the Cox regression model and the Fine and Gray model to determine the risk factors that influence the survival of arthroplasty patients was carried out by (Ranstam and Robertsson, 2017).The results of research using Cox regression analysis provide better results than analysis using Fine and Gray, because parameter estimates from the Fine and Gray model can be misleading if interpreted in terms of relative risk.One study that used the Cox proportional hazard regression method was conducted by Royston and Altman (2013).The result of the research is that the survival of a patient from a disease is related to survival data so that survival analysis and Cox regression can be used, namely a mathematical model approach to describe the survival curve by considering several risk factors simultaneously.
Based on the background description above, the purpose of this study is to identify factors that influence the cure rate for TBC sufferers at Haji Makassar Hospital using survival analysis with the Cox proportional hazard regression method, The hope is to be able to sign an increasing number of cases.

B. RESEARCH METHOD 1. Data Source
The data used in this study is secondary data and to bring together the variables used in this study, namely the factors that affect the recovery rate of TBC patients at Haji Makassar Hospital in 2022.After data collection, the number of TBC patients at Haji Makassar Hospital was obtained from January to December 2022 with a total of 184 patients.Of the 184 patients data, after testing the existing assumptions in the cox proportional hazard regression method, it turned out that only 170 TBC patients data met the assumptions.So that out of the 184 total population of TBC patients at Haji Makassar Hospital in 2022, 170 samples met the assumptions.

Table 1. Variable Operational Definition
Variable Variable Name Descriptive

Y Survival Time
The length of time a TBC patient has been treated (inpatient and/or outpatient) at the hospital (Days).S Status 0 = Failure (absent/died during observation); 1 = Success (healed) Cough with Phlegm 0 = Not coughing up phlegm; 1 = Cough with phlegm X 4 Hard to Breathe 0 = Not hard to breathe; 1 = Hard to breathe X 5 Night Sweat 0 = Not night sweats; 1 = Night sweat X 6 Smoking Habit 0 = Not smoke; 1 = Smoke X 7 History Status 0 = New case; 1 = Relapse case X 8 Drug Status 0 = Not take medicine regularly; 1 = Take medicine regularly

Research Procedure
The research procedure carried out is given in Figure 1.At this stage, researchers prepare patient survival time data, namely determining the starting point, end point, and time measurement scale in the study.The starting point in this study was the admission of TBC patients to the hospital, while the end point was determining the event status.The time measurement scale used in this research is days.In this study, right censoring was used with the observed event being healing.2. Descriptive analysis of predictor variables and response variables.3. Test the survival time distribution using the Kolmogorov Smirnov test (Chandra and Rohmaniah, 2019).4. Testing the proportional hazard assumption using the GOF test.
The Goodness of Fit (GOF) estimation method uses Schoenfeld residual test statistics to test the proportional hazard assumption as follows (Istuti et al., 2019): : Value of the j-th independent variable from the individual who experienced the event at time t i 5. Parameter estimation for the cox proportional hazard regression estimator model using the maximum partial likelihood estimation (MLE) method is used as follows (Pertiwi and Purnami, 2020): where: L(β) : Maximum likelihood estimation of parameter using the exact likelihood method r : Individuals who experience a success event β : Regression Coefficient x i : Variable vector of individuals who failed at time-i x l : Vector of individuals variablethat are still alive and are elements of R ti R ti : The set of individuals who survived at time − i After obtaining the likelihood function, it is then transformed into ln form and equalizes zero in the first derivative of the function.In estimating the β parameter, the Newton Raphson method is used to maximize the partial likelihood function.6. Testing the significance of the parameters of the cox proportional hazard regression estimator model using simultaneous test and partial test (Fajarini and Fatekurohman, 2018).
(a) Simultaneous Test 7. Formation of the final cox proportional hazard regression model with significant variables 8. Interpretation of the results of the hazard ratio from the cox proportional hazard regression model is suitable.9.The conclusion of the entire data analysis that has been processed.

C. RESULTS AND DISCUSSION
1. Descriptive Analysis The study with survival analysis involved 170 samples from 184 population of TBC patients at Haji Makassar Hospital in 2022, where 13 patients received treatment as short as 1 day and as long as 19 days as 1 patient and the average treatment was 7 days and status The event sensor observed successful events as many as 141 people (83%).The average age of TBC patients is 47 years, with the highest age being 84 years and the lowest age being 17 years.Then TBC disease is experienced by the majority of patients who are male (65%) so that the factor of smoking habits (59%) also occurs more frequently.For symptoms or conditions coughing up blood (55%), shortness of breath (97%), and night sweats (72%) mostly experienced by TBC sufferers.Patients with new history status (6%) experienced the majority of TBC patients, as well as regular treatment status (83%) experienced more TBC patients.From Table 3, it can be seen that each distribution tested has a p-value < 0.05, so it can be concluded that the decision to reject H 0 means that the survival time data does not meet distribution.As previously explained, Cox proportional hazards regression is a semiparametric model that does not require special information about the distribution underlying the survival time, so testing can be continued using the Cox proportional hazards regression method.

Proportional Hazard Assumption Testing
The proportional hazard assumption is said to be fulfilled if the p-value is > 0.05 significance level with the hypothesis being tested: H 0 = Data meet the proportional hazard assumption.H 1 = Data does not meet the proportional hazard assumption.Based on Table 4, it can be seen that each variable has a p-value > 0.05, so it can be concluded that the decision to accept H0 means that these variables meet the proportional hazard assumption or in other words the variables used have a proportional hazard value or risk effect between one individual and another and is constant over time so that the results will be easy to interpret.

Parameter Significance Test 1. Simultaneous Test
The testing criteria in the simultaneous test are p-value < 0.05 significance level with the hypothesis being tested: H 0 : β 1 = β 2 = . . .= β h = 0 (none of the predictor variables were significant) H 1 : β h = 0, with h = 1, 2, . . ., p (there is at least one significant predictor variable).Based on Table 5, it can be seen that the p-value likelihood ratio is 0.0026 < 0.05, so it can be concluded that the decision to reject H0 means that there is at least one predictor variable that has a significant effect on the response variable in the model.

Partial Test
The testing criteria in the partial test are p-value < 0.05 significance level with the hypothesis being tested: H 0 : β h 0, with h = 1, 2, . . ., p (h predictor variable is not significant) H 1 : β h = 0 (h predictor variable is significant).
Based on Table 5, it can be seen that the predictor variables for shortness of breath and smoking habits have a p-value < 0.05, so it can be concluded that the decision to reject H 0 means that the two predictor variables have a significant effect on the recovery rate of TBC patients at Haji Makassar Hospital.Based on the parameter estimation results in Table 6, the final cox proportional hazard regression model is obtained as follows:

Final Cox Proportional Hazard Regression Model with Significant Variables
From the cox proportional hazard model with significant variables, another simultaneous test was carried out to find out whether the model was truly significant or not.Based on Table 6, it can be seen that the p-value likelihood ratio is 0.0340 < 0.05, so it can be concluded that the decision to reject H 0 means that at least there is one predictor variable that is significant so that the cox proportional hazard model is significant.

Hazard Ratio Significant Variable
Based on Table 6, it it can be seen that the hazard ratio value for the shortness of breath variable is 0.3506 and the smoking habit variable is 0.7367.A hazard ratio value of less than 1 indicates a preventive factor or has a smaller risk of failure (Chandra and Rohmaniah, 2019).So that TBC patients who do not experience shortness of breath have a recovery rate of 0.3506 times more likely to recover than those who experience shortness of breath, likewise those who do not smoke have a recovery rate of 0.7367 times more likely to recover than those who smoke.
This study is directly proportional to research conducted by Tamara et al. (2021), which was published in a medical journal where shortness of breath and smoking habits have an influence on TBC disease.TBC patients have many various complaints that can cause breathing problems, one of which can cause shortness of breath due to airway obstruction caused by TBC germs.Shortness of breath occurs in advanced TBC disease, where the infiltration has covered half of the lungs.Smoking habits increase the risk of suffering from TBC because the toxic content such as tar inhaled from cigarette smoke can interfere with the clarity of the ciliary mucosa which is used as the main defense mechanism in causing changes in the structure and function of the airways and lung tissue, as well as the host's immunological response to infection.

D. CONCLUSION AND SUGGESTION
Based on the results of research and discussions that have been carried out on TBC patient data at Haji Makassar Hospital in 2022, it can be concluded that to obtain the final model, namely estimating the parameters and testing the significance of the parameters of the variables used and then obtaining significant variables and eliminating variables.which is not significant.Then the parameter estimation is carried out again and the significance of the parameters is tested only on previously significant variables.So you will get the final cox proportional hazards regression model with significant variables, as follows: h(t) = h 0 (t)exp(−1, 0480X 4 − 0, 3056X 6 ) It can be seen in the model above that there are two significant variables, namely X 4 (shortness of breath) and X 6 (smoking habit).The shortness of breath variable has an influence on the cure rate for TBC patients, namely e −1,0480 = 0, 3506 and the smoking habit variable e −0,3056 = 0, 7367.A hazard ratio value of less than 1 indicates a preventive factor or has a smaller risk of failure.So that TBC patients who do not experience shortness of breath have a recovery rate of 0.3506 times more likely to recover than those who experience shortness of breath, likewise those who do not smoke have a recovery rate of 0.7367 times more likely to recover than those who smoke.
For further research, to prioritize complete data to reduce censored data and it is necessary to add predictor variables related to the social and environmental conditions of patients suffering from Tuberculosis, because the social conditions of sufferers are also considered to influence the success rate of treatment of Tuberculosis patients, such as knowledge related to TB, home and environmental conditions.around, and so on.For people who have a smoking habit and experience shortness of breath, they should immediately see a doctor to avoid TBC disease, and if they have been infected, they should take regular treatment and avoid things that can make TBC disease worse.

Figure 1 .
Figure 1.Research Procedure Flowchart Diagram where D : Kolmogorov-Smirnov method test statistics F n (x) : Hypothesized distribution function F 0 (x) : Empirical data distribution function.
The value of the likelihood function in the model without variables L F : The value of the likelihood function in the model with variables.(b) Partial Test W 2 = βh SE( βh ) βh : Parameter estimation of the h-th predictor variable SE( βh ) : Standard error of parameter estimates of the h-th predictor variable.JURNAL VARIAN | e-ISSN: 2581-2017

FigureFigure 3 .
Figure 2. Survival Time Histogram Schoenfeld residual of the j-th independent variable from the individual who experienced the event at timet i Rij : Average schoenfeld residual of the j-th independent variable from the individual who experienced the event at time t i RT i : Ranking of survival time for the i-th individuals RT i : Average survival time ranking for the i-th individuals δ i : Cencorship indicator δ : Likelihood maximum estimator of β x ij

Table 3 .
Survival Time Distribution Test

Table 4 .
Proportional Hazard Assumption Testing

Table 5 .
Parameter Estimation of the Cox Proportional Hazard Regression Estimator Model

Table 6 .
Final Cox Proportional Hazard Regression Model with Significant Variables