What drives Customer churn?
An R interactive learning
Customer churn (also known as customer attrition) refers to the phenomenon where customers stop using a product or service provided by a company. Itβs essential for businesses to understand why customers leave and take steps to reduce churn.
The goal of this customer churn project is to analyze historical data and identify patterns that lead to churn. By understanding these patterns, businesses can take targeted actions to retain customers.
Load in the dataset
look at the structure of the data
- generally the dataset has a dimension of
____
rows and____
columns - we need to tune our data a bit inorder to change some anormalies in it , we need to :
- change characters to factors
- change the naming conversion in our variables
Create clean names for our variables
- we can use the
clean_names()
function from thejanitor
package in order to have more standardized and clean names
- let us look at the old names of the variables and the current new names of the variables
Old names
New names
Change characters
to factors
- To achieve this we can use the
mutate_if()
function
look at the data types now
- that is better now
Remove unnecessary columns
Do we have any missing data?
- we have 11 missing values on the variable
total_charges
- we can drop these rows
Droping rows with missing data
Further wrangling
- Turn
churn
to a binary variable - change
senior_citizen
to an integer
now look at the final dataset
what drives customer churn . (t.test)
- firstly lets see if there is any relationship between churn and monthly charges
- specifically we want to answer whether the difference in mean monthly charges for customers who churned and those who did not is
statistically significant
. To do this we can use at.test
- Null Hypothesis \((H_0)\): The true difference in means between the two groups is equal to 0.
- Alternative Hypothesis \((H_1)\): The true difference in means between the two groups is not equal to 0.
- The p-value is less than 2.2e-16, which is extremely small. This suggests strong evidence against the null hypothesis. Therefore, we reject the null hypothesis in favor of the alternative hypothesis.
- The 95% confidence interval for the difference in means is given as (-14.53786, -11.72998).
- This interval provides a range within which we are 95% confident that the true difference in means lies.
- The mean monthly_charges for the βNoβ group is approximately 61.31.
- The mean monthly_charges for the βYesβ group (presumably churned customers) is approximately 74.44.
In summary, there is strong evidence to suggest that the mean monthly_charges differ significantly between the two groups. Churned customers (group βYesβ) tend to have higher monthly charges compared to non-churned customers (group βNoβ).
Is churn associated with Contract type
- we can answer this using a chi-squared test for association
- The calculated chi-squared test statistic is 1179.5.
- The degrees of freedom for this test are 2.
- The reported p-value is less than 2.2e-16, which indicates strong evidence against the null hypothesis (i.e., the variables are independent).
Since the p-value is extremely small, we reject the null hypothesis. This suggests that there is a significant association between customer churn and contract type.