SQL Churn Rate and Funnel Analysis

What is Churn?

This is the amount of customers who cancel/stop their subscription to a service/product within a specified period of time.

Churn

The image above represents two overlapping circles. The blue area is the amount of subscribers that cancel within a given period of time.

The acquisition area is the amount of customers that subscribe for the service during the measured time.

The overlapping area(retention) is the amount of subscribers that renew their subscriptions.

What is Churn rate?

This is the proportion of customers who cancel their subscription to a service or product within a given period of time to the total number of subscribers at the start of that period.

Customers who subscribe to the service during the month are not included in the churn rate calculation.

This metric can be calculated by dividing the number of cancellations by the total number of subscribers at the beginning that period i.e. the number of active customers for that period.

Churn rate formula

Why Churn rate is important?

You can use churn rate to:

  • Evaluate the performance of a product or service
  • Forecast the performance of a product or service
  • Understand the effects of certain changes in the product or service
  • Make informed decisions to increase customer retention

Let's create a dummy 'subscriptions' table in our database from one of our previous sessions.

Read: SQL for Data Science: Create Your First Database and Table

Click on the 'Execute SQL' button and enter the below code.

CREATE TABLE subscriptions(
id INTEGER PRIMARY KEY,
customer_id INTEGER,
start_time INTEGER, 
end_time INTEGER
);

The code above creates a dummy 'subscriptions' table with 4 columns. You would notice we added a primary key constraint for the 'id' column. This means that no two records can have the same value for this column. Every customer has a unique row.

SQLite does not support built-in date and/or time storage class. Instead, it uses other storage classes such as TEXT , REAL , or INTEGER for storing the date and time values.

Let's populate our table with some records.

INSERT INTO subscriptions(customer_id,start_time,end_time)
VALUES(91,"2020-01-04","2020-04-04");

INSERT INTO subscriptions(customer_id,start_time,end_time)
VALUES(102,"2020-01-07","2020-04-07");

INSERT INTO subscriptions(customer_id,start_time,end_time)
VALUES(183,"2020-01-12","2020-04-12");

INSERT INTO subscriptions(customer_id,start_time,end_time)
VALUES(204,"2020-01-17","2020-02-17");

INSERT INTO subscriptions(customer_id,start_time,end_time)
VALUES(265,"2020-01-19","2020-02-19");

INSERT INTO subscriptions(customer_id,start_time,end_time)
VALUES(556,"2020-02-21","2020-03-21");

INSERT INTO subscriptions(customer_id,start_time,end_time)
VALUES(598,"2020-02-25","2020-04-25");

INSERT INTO subscriptions(customer_id,start_time,end_time)
VALUES(1556,"2020-02-09","2020-05-09");

We did not specify values for the 'id' column because SQLite automatically generates an row-id column. The primary key column points to this row.

Let's view the table

SELECT * FROM subscriptions;

Subscriptions table

The table contains records of subscriptions of customers and their subscription duration. Let's calculate the churn rate for the month of February.

The churn for February would be the amount of subscribers that cancelled their subscriptions in this month. It also contains customers whose subscription expired in February.

To calculate the churn rate, we'll need to know the total number of subscribers at the start of February.

We can do this by checking for subscribers who still have active subscriptions as at the first day in February. The query below would return these records.

SELECT * FROM subscriptions
WHERE start_time<"2020-02-01" AND end_time>="2020-02-01";

Active subscriptions

To find the customers who cancelled or had expired subscription, we'll run the query below.

SELECT * FROM subscriptions
WHERE start_time<"2020-02-01" AND end_time<="2020-02-28";

Cancelled subscriptions

To calculate the churn rate, we can simply divide the churn by the total subscribers. Lets combine the previous queries into one.

SELECT 
(1.0*(SELECT COUNT(*) FROM subscriptions
WHERE start_time<"2020-02-01" AND end_time<="2020-02-28"))/
(1.0*(SELECT COUNT(*) FROM subscriptions
WHERE start_time<"2020-02-01" AND end_time>="2020-02-01")) 
AS "churn_rate";

Churn rate

We can perform mathematical operations with the SELECT command. SELECT 1.0*5 gives 5.0, SELECT 1.0/4.0 gives 0.25.

We multiplied each result by 1.0 to convert the integer values to float because SQLite chops off decimal points in the division operation of integers.

This method can get messy when calculating the churn for multiple months.

Let's do this in a better way.

WITH enrollments AS 
(SELECT * FROM subscriptions
 WHERE start_time<"2020-02-01" AND
 end_time>="2020-02-01"),

status AS 
(SELECT customer_id, 
CASE
    WHEN start_time<"2020-02-01" AND 
   (end_time>="2020-02-01" OR end_time IS NULL) THEN 1
    ELSE 0
END AS is_active,
CASE
    WHEN end_time>"2020-02-28" OR end_time IS NULL THEN 0
    ELSE 1
END AS is_cancelled
FROM enrollments)

SELECT (1.0*SUM(is_cancelled))/
(1.0*SUM(is_active)) AS "churn_rate"
FROM status;

Churn rate

We created an 'enrollments' table to include all records within a given period of time (1 month in this case).

Then, we created a 'status' table to specify active and cancelled subscriptions within that given period of time.

CASE
WHEN start_time<"2020-02-01"
AND (end_time>="2020-02-01" OR end_time IS NULL) THEN 1
ELSE 0
END AS is_active

The code above creates a temporary column, 'is_active', in the 'status' table. Like we defined earlier, active subscribers are customers that are still enrolled for a service as at the beginning of the month. Even if a customer cancels on that day, he/she is considered active for that month.

CASE
WHEN end_time>"2020-02-28" OR end_time IS NULL THEN 0
ELSE 1
END AS is_cancelled

The code above creates another temporary column, 'is_cancelled', in the 'status' table. If a subscriber cancels within the month, it fills in '1' for this column and '0' otherwise.

Any record with a null value in the 'end_time' column has not cancelled.

SELECT (1.0*SUM(is_cancelled))/(1.0*SUM(is_active)) AS "churn_rate" FROM status;

This single month calculation can be extended to multiple months.

Funnel Analysis

You can also evaluate the performance of your product/service by calculating the conversion rates between events. Let's say you have a website for your product. Some of the events could be:

  1. View the product
  2. Add the product to cart
  3. Checkout the product

We might want to know how much people completed each process. Let's say 100 people view a product and only 40 add the product to cart. The conversion rate in this scenario is 40/100 = 0.4 or 40%.

Out of the 40 who added to cart, only 15 checked out the product. The conversion rate is 20/40 = 0.5 or 50%.

This metric can give a great overview about your product/service. For this use case, you might want to question why many visitors don't add their viewed products to cart.

You might consider reducing the price of the product then sending a personalized mail to every visitor about a special offer on that product just for them. You should rebuild the funnel after some time to see if the conversion rate increases or decreases.

You might also want to consider increasing the cart-to-checkout conversion rate.

Imagine Alex. Alex finds a very cool birthday gift for his friend and adds it to his cart. Before he can advance to the checkout page his sister calls, asking for a recipe. Alex navigates away from the checkout page to look up google for the recipe. He totally forgets to complete his purchase.

Simply sending a reminder mail to Alex would work in this scenario.

Funnels give insights on improving a product or service. You might consider creating a funnel whenever you change some features in your product/service. This can help you make well informed decisions.

Let's create another dummy table to that includes the events of visitors on a website, "mydatacourse"

The events in this funnels would be:

  1. View Course
  2. Add Course to Cart
  3. Checkout Course

Run the below query in your sql editor.

CREATE TABLE mydatacourse(
user_id INTEGER,
user_action TEXT,
timestamp TEXT 
);

INSERT INTO mydatacourse
VALUES (1, "1. View Course","2018-01-02 23:14:01");

INSERT INTO mydatacourse
VALUES (1, "2. Add Course to Cart","2020-01-02 23:20:01");

INSERT INTO mydatacourse
VALUES (1, "3. Checkout Course","2020-01-02 23:21:10");

INSERT INTO mydatacourse
VALUES (2, "1. View Course","2020-01-03 05:20:15");

INSERT INTO mydatacourse
VALUES (2, "2. Add Course to Cart","2020-01-03 05:24:09");

INSERT INTO mydatacourse
VALUES (3, "1. View Course","2020-01-03 18:15:01");

INSERT INTO mydatacourse
VALUES (4, "1. View Course","2020-01-03 20:14:01");

INSERT INTO mydatacourse
VALUES (4, "2. Add Course to Cart","2020-01-03 20:19:01");

INSERT INTO mydatacourse
VALUES (2, "3. Checkout Course","2020-01-04 23:14:01");

Let's find the number of visitors who completed each action

SELECT user_action, COUNT(*) as "number"
FROM mydatacourse
GROUP BY 1;

Funnels

The conversion rate from event 1 to event 2 is 3/4 = 0.75

The conversion rate from event 2 to event 3 is 2/3 = 0.66

Try the SELECT statements for this calculation by yourself.

Practice these concepts on your own and get comfortable with them.

We have just concluded the SQL series for this Data Science Career Path. In the next blog post, we'll begin with the Python series.

3 comments:

  1. Could you also pls give hint to the extension of this customer churn rate calculation to multiple months?

    ReplyDelete
    Replies
    1. Change the start and end time to suit your needs. If you'll be calculating the churn for a 3 month period, January to March maybe. You can choose a start time of 01-01-xxxx and an end time of 01-03-xxxx

      Delete

Powered by Blogger.