What drives Customer churn?

An R interactive learning

Author

Bongani Ncube

Published

5 March 2026

Note

Customer churn (also known as customer attrition) refers to the phenomenon where customers stop using a product or service provided by a company. It’s essential for businesses to understand why customers leave and take steps to reduce churn.

The goal of this customer churn project is to analyze historical data and identify patterns that lead to churn. By understanding these patterns, businesses can take targeted actions to retain customers.

Load in the dataset

look at the structure of the data

Note
  • generally the dataset has a dimension of ____ rows and ____ columns
  • we need to tune our data a bit inorder to change some anormalies in it , we need to :
    • change characters to factors
    • change the naming conversion in our variables

Create clean names for our variables

  • we can use the clean_names() function from the janitor package in order to have more standardized and clean names
  • let us look at the old names of the variables and the current new names of the variables

Old names

New names

Change characters to factors

  • To achieve this we can use the mutate_if() function

look at the data types now

Note
  • that is better now

Remove unnecessary columns

Do we have any missing data?

Note
  • we have 11 missing values on the variable total_charges
  • we can drop these rows

Droping rows with missing data

Further wrangling

  • Turn churn to a binary variable
  • change senior_citizen to an integer

now look at the final dataset

what drives customer churn . (t.test)

  • firstly lets see if there is any relationship between churn and monthly charges
  • specifically we want to answer whether the difference in mean monthly charges for customers who churned and those who did not is statistically significant . To do this we can use a t.test
Explaining the above output
  • Null Hypothesis \((H_0)\): The true difference in means between the two groups is equal to 0.
  • Alternative Hypothesis \((H_1)\): The true difference in means between the two groups is not equal to 0.
  • The p-value is less than 2.2e-16, which is extremely small. This suggests strong evidence against the null hypothesis. Therefore, we reject the null hypothesis in favor of the alternative hypothesis.
  • The 95% confidence interval for the difference in means is given as (-14.53786, -11.72998).
  • This interval provides a range within which we are 95% confident that the true difference in means lies.
  • The mean monthly_charges for the β€œNo” group is approximately 61.31.
  • The mean monthly_charges for the β€œYes” group (presumably churned customers) is approximately 74.44.

In summary, there is strong evidence to suggest that the mean monthly_charges differ significantly between the two groups. Churned customers (group β€œYes”) tend to have higher monthly charges compared to non-churned customers (group β€œNo”).

Is churn associated with Contract type

  • we can answer this using a chi-squared test for association
Explaining results
  • The calculated chi-squared test statistic is 1179.5.
  • The degrees of freedom for this test are 2.
  • The reported p-value is less than 2.2e-16, which indicates strong evidence against the null hypothesis (i.e., the variables are independent).

Since the p-value is extremely small, we reject the null hypothesis. This suggests that there is a significant association between customer churn and contract type.