What is Churn?
This is the amount of customers who cancel/stop their subscription to a service/product within a specified period of time.
The image above represents two overlapping circles. The blue area is the amount of subscribers that cancel within a given period of time.
The acquisition area is the amount of customers that subscribe for the service during the measured time.
The overlapping area(retention) is the amount of subscribers that renew their subscriptions.
What is Churn rate?
This is the proportion of customers who cancel their subscription to a service or product within a given period of time to the total number of subscribers at the start of that period.
Customers who subscribe to the service during the month are not included in the churn rate calculation.
This metric can be calculated by dividing the number of cancellations by the total number of subscribers at the beginning that period i.e. the number of active customers for that period.
Why Churn rate is important?
You can use churn rate to:
- Evaluate the performance of a product or service
- Forecast the performance of a product or service
- Understand the effects of certain changes in the product or service
- Make informed decisions to increase customer retention
Let's create a dummy 'subscriptions' table in our database from one of our previous sessions.
Read: SQL for Data Science: Create Your First Database and Table
Click on the 'Execute SQL' button and enter the below code.
CREATE TABLE subscriptions( id INTEGER PRIMARY KEY, customer_id INTEGER, start_time INTEGER, end_time INTEGER );
The code above creates a dummy 'subscriptions' table with 4 columns. You would notice we added a primary key constraint for the 'id' column. This means that no two records can have the same value for this column. Every customer has a unique row.
SQLite does not support built-in date and/or time storage class. Instead, it uses other storage classes such as TEXT , REAL , or INTEGER for storing the date and time values.
Let's populate our table with some records.
INSERT INTO subscriptions(customer_id,start_time,end_time) VALUES(91,"2020-01-04","2020-04-04"); INSERT INTO subscriptions(customer_id,start_time,end_time) VALUES(102,"2020-01-07","2020-04-07"); INSERT INTO subscriptions(customer_id,start_time,end_time) VALUES(183,"2020-01-12","2020-04-12"); INSERT INTO subscriptions(customer_id,start_time,end_time) VALUES(204,"2020-01-17","2020-02-17"); INSERT INTO subscriptions(customer_id,start_time,end_time) VALUES(265,"2020-01-19","2020-02-19"); INSERT INTO subscriptions(customer_id,start_time,end_time) VALUES(556,"2020-02-21","2020-03-21"); INSERT INTO subscriptions(customer_id,start_time,end_time) VALUES(598,"2020-02-25","2020-04-25"); INSERT INTO subscriptions(customer_id,start_time,end_time) VALUES(1556,"2020-02-09","2020-05-09");
We did not specify values for the 'id' column because SQLite automatically generates an row-id column. The primary key column points to this row.
Let's view the table
SELECT * FROM subscriptions;
The table contains records of subscriptions of customers and their subscription duration. Let's calculate the churn rate for the month of February.
The churn for February would be the amount of subscribers that cancelled their subscriptions in this month. It also contains customers whose subscription expired in February.
To calculate the churn rate, we'll need to know the total number of subscribers at the start of February.
We can do this by checking for subscribers who still have active subscriptions as at the first day in February. The query below would return these records.
SELECT * FROM subscriptions WHERE start_time<"2020-02-01" AND end_time>="2020-02-01";
To find the customers who cancelled or had expired subscription, we'll run the query below.
SELECT * FROM subscriptions WHERE start_time<"2020-02-01" AND end_time<="2020-02-28";
To calculate the churn rate, we can simply divide the churn by the total subscribers. Lets combine the previous queries into one.
SELECT (1.0*(SELECT COUNT(*) FROM subscriptions WHERE start_time<"2020-02-01" AND end_time<="2020-02-28"))/ (1.0*(SELECT COUNT(*) FROM subscriptions WHERE start_time<"2020-02-01" AND end_time>="2020-02-01")) AS "churn_rate";
We can perform mathematical operations with the SELECT command. SELECT 1.0*5 gives 5.0, SELECT 1.0/4.0 gives 0.25.
We multiplied each result by 1.0 to convert the integer values to float because SQLite chops off decimal points in the division operation of integers.
This method can get messy when calculating the churn for multiple months.
Let's do this in a better way.
WITH enrollments AS (SELECT * FROM subscriptions WHERE start_time<"2020-02-01" AND end_time>="2020-02-01"), status AS (SELECT customer_id, CASE WHEN start_time<"2020-02-01" AND (end_time>="2020-02-01" OR end_time IS NULL) THEN 1 ELSE 0 END AS is_active, CASE WHEN end_time>"2020-02-28" OR end_time IS NULL THEN 0 ELSE 1 END AS is_cancelled FROM enrollments) SELECT (1.0*SUM(is_cancelled))/ (1.0*SUM(is_active)) AS "churn_rate" FROM status;
We created an 'enrollments' table to include all records within a given period of time (1 month in this case).
Then, we created a 'status' table to specify active and cancelled subscriptions within that given period of time.
CASE
WHEN start_time<"2020-02-01"
AND (end_time>="2020-02-01" OR end_time IS NULL) THEN 1
ELSE 0
END AS is_active
The code above creates a temporary column, 'is_active', in the 'status' table. Like we defined earlier, active subscribers are customers that are still enrolled for a service as at the beginning of the month. Even if a customer cancels on that day, he/she is considered active for that month.
CASE
WHEN end_time>"2020-02-28" OR end_time IS NULL THEN 0
ELSE 1
END AS is_cancelled
The code above creates another temporary column, 'is_cancelled', in the 'status' table. If a subscriber cancels within the month, it fills in '1' for this column and '0' otherwise.
Any record with a null value in the 'end_time' column has not cancelled.
SELECT (1.0*SUM(is_cancelled))/(1.0*SUM(is_active)) AS "churn_rate" FROM status;
This single month calculation can be extended to multiple months.
Funnel Analysis
You can also evaluate the performance of your product/service by calculating the conversion rates between events. Let's say you have a website for your product. Some of the events could be:
- View the product
- Add the product to cart
- Checkout the product
We might want to know how much people completed each process. Let's say 100 people view a product and only 40 add the product to cart. The conversion rate in this scenario is 40/100 = 0.4 or 40%.
Out of the 40 who added to cart, only 15 checked out the product. The conversion rate is 20/40 = 0.5 or 50%.
This metric can give a great overview about your product/service. For this use case, you might want to question why many visitors don't add their viewed products to cart.
You might consider reducing the price of the product then sending a personalized mail to every visitor about a special offer on that product just for them. You should rebuild the funnel after some time to see if the conversion rate increases or decreases.
You might also want to consider increasing the cart-to-checkout conversion rate.
Imagine Alex. Alex finds a very cool birthday gift for his friend and adds it to his cart. Before he can advance to the checkout page his sister calls, asking for a recipe. Alex navigates away from the checkout page to look up google for the recipe. He totally forgets to complete his purchase.
Simply sending a reminder mail to Alex would work in this scenario.
Funnels give insights on improving a product or service. You might consider creating a funnel whenever you change some features in your product/service. This can help you make well informed decisions.
Let's create another dummy table to that includes the events of visitors on a website, "mydatacourse"
The events in this funnels would be:
- View Course
- Add Course to Cart
- Checkout Course
Run the below query in your sql editor.
CREATE TABLE mydatacourse( user_id INTEGER, user_action TEXT, timestamp TEXT ); INSERT INTO mydatacourse VALUES (1, "1. View Course","2018-01-02 23:14:01"); INSERT INTO mydatacourse VALUES (1, "2. Add Course to Cart","2020-01-02 23:20:01"); INSERT INTO mydatacourse VALUES (1, "3. Checkout Course","2020-01-02 23:21:10"); INSERT INTO mydatacourse VALUES (2, "1. View Course","2020-01-03 05:20:15"); INSERT INTO mydatacourse VALUES (2, "2. Add Course to Cart","2020-01-03 05:24:09"); INSERT INTO mydatacourse VALUES (3, "1. View Course","2020-01-03 18:15:01"); INSERT INTO mydatacourse VALUES (4, "1. View Course","2020-01-03 20:14:01"); INSERT INTO mydatacourse VALUES (4, "2. Add Course to Cart","2020-01-03 20:19:01"); INSERT INTO mydatacourse VALUES (2, "3. Checkout Course","2020-01-04 23:14:01");
Let's find the number of visitors who completed each action
SELECT user_action, COUNT(*) as "number" FROM mydatacourse GROUP BY 1;
The conversion rate from event 1 to event 2 is 3/4 = 0.75
The conversion rate from event 2 to event 3 is 2/3 = 0.66
Try the SELECT statements for this calculation by yourself.
Practice these concepts on your own and get comfortable with them.
We have just concluded the SQL series for this Data Science Career Path. In the next blog post, we'll begin with the Python series.
Insightful
ReplyDeleteCould you also pls give hint to the extension of this customer churn rate calculation to multiple months?
ReplyDeleteChange the start and end time to suit your needs. If you'll be calculating the churn for a 3 month period, January to March maybe. You can choose a start time of 01-01-xxxx and an end time of 01-03-xxxx
Delete