Loyal customers who love your brand and buy from you time and time again—it’s the stuff of every ecommerce marketer’s dreams. But how do you make those dreams come true?

One tool brands are increasingly implementing to augment their customer acquisition and retention strategies? Subscription services.

The market for ecommerce subscription services is big. In the past five years, it’s grown by more than 100 percent each year, according to McKinsey.

"The subscription e-commerce market has grown by more than 100 percent a year over the past five years."

McKinsey & Co.

It makes sense. Subscription services are convenient, affordable, and exciting. And they’re a great way to nurture ongoing relationships with your customers.

Picture a package of clothing or specialty foods arriving at your front door filled with a curated set of products based on your own personal preferences. Many times with subscription services, you discover products you never would’ve known about if they hadn’t been chosen for you.

Or imagine being able to easily and mindlessly replenish many of the products you continuously repurchase, like pet food, household cleaning supplies, personal care items, and so much more. Putting these purchases on autopilot lets you run one less errand during your free time or make one less decision on busy days. Plus, many offer discounts on the purchase price of recurring orders, which makes it easy to save money on things you buy regularly.

But what happens when your customers don’t love the products they get in their subscriptions? Do they keep letting those orders run regularly or do they cancel? Chances are, if they’re not happy, they’ll skip their next order or they’ll cancel their subscription completely.

It’s costly to acquire new customers, so it’s well-worth any brand’s time to keep the ones you’ve got. How can you minimize subscription cancellations? Can you identify the products people tend to love and those that they’ll tend to dislike? And can you use that data to either include or exclude those products from your subscriptions?

Let’s take a look at one company Klaviyo’s data science team worked with to dig into such questions.

Super Subscription Box (an anonymized company we’ll refer to as SSB) is a typical subscription box business. Every month, SSB sends their subscribers a curated product and customers never receive the same product twice. There’s no fixed progression of when a product appears in a customer’s subscription, so different customers may get the same product during different months of their subscriptions. SSB also has a few rules to exclude certain products from the subscription boxes of customers with certain needs (i.e., vegan customers will never receive products with animal derivatives). As a result, customers have very different product journeys during their subscription to SSB.

Here’s an example of what three customers’ journeys might look like:

SSB shared the list of every product a customer had received, what date each customer received them, and when each customer canceled with Klaviyo’s data science team. With this information, the team was able to construct the product journey for each customer.

SSB offered a significant number of products with their subscriptions. This meant there was an even greater number of combinations of products a customer could receive. While there was much diversity in the journeys various customers had, some customers still had the same experience. For example, all of the people who received a pair of flip flops in month one and then canceled had the same journey. But people like Lisa in the example above who kept her subscription for a long period of time had a much more unique journey. The data science team worked with the various customer journeys to learn which products were performing better or worse in SSB’s catalog.

To go from a list of customer journeys to a numerical score for each product, the team chose to model SSB’s subscription plan as a Markov Chain, which is a way of modeling a discrete sequence of events or states (*if you want to dig deeper into the data science, check out the section included at the end of this article; if that’s not for you, keep reading and we’ll sum up the main points soon).

For each product journey, the team considered each month and product pair as a unique state. At each month, customers could go on to the next month of their subscription or they could unsubscribe. With the Markov chain model, the team wanted to evaluate why SSB’s customers canceled their subscription and sought to answer a few questions, including:

1. Do customers cancel based on what products they receive or is the length of time in the subscription the only predictor of how likely they are to cancel?
2. Do some products cause more cancellations than others?
3. What’s more important: the length of time of the customer’s subscription or the product itself?

The data showed a few things.

First, cancellations occurred because of both time (the length of a customer’s subscription) and product (what the customer received in their subscription).

Second, not all products are equal. Some products performed really well. Other products had a high probability of causing a cancellation, so it’s likely a good choice for SSB to remove those items from its subscription.

Third, the data showed cancellations during the second month likely happen because a customer disliked a product. The team believes that many customers who sign up for a subscription are willing to try it out for the first month, but if they aren’t impressed by what they receive, they’ll cancel their subscription after the second month comes around. The second month is an important month to include a high-performing product in the customer’s subscription.

As we can see with SSB’s subscription data, it’s important to know which of the products in your subscription are likely to be ones your customers will love and which will likely be the ones they’ll dislike. This information will help you avoid costly cancellations and preserve the valuable customer relationships you’ve worked so hard to build.

**********************************

## Diving Deep into the Data Science

Want to dig a little deeper into the SSB example? Christina Dedrick, data scientist at Klaviyo, takes you through it.

What is a Markov Chain?
Markov chains are a way of modeling a discrete sequence of events or states. Markov chains are made of nodes, representing states, and transitions, shown as arrows, between nodes. Transitioning between states only depends on the current state you’re in. Each transition is represented by a probability and the sum of probabilities out of a node must sum to exactly 1. (The sum of probabilities going into a state can be more than 1 because there can be many ways into a state).

For instance, the following Markov Chain could represent someone’s mood:

In this chain, there are two states represented by the two nodes: happy and sad. Each arrow represents a transition between states, and there can be transitions that keep you in the same state you started in.

In the chain above, if you’re in the sad state, there’s a 50 percent chance you stay sad and a 50 percent chance you become happy. Once you’re happy, there’s an 80 percent chance you stay happy and a 20 percent chance you transition to sad.

While being happy determines how likely you are to stay happy during your next transition, the state you were in before you were happy does not affect that. So it’s equally likely that you’ll stay happy if your previous states were “sad to happy” and “happy to happy.”

Now, what if we don’t know what the transition probabilities are, but we know the sequence of states for a sample of people? For our happy/sad example, we might know our people fit the following model, but we don’t yet know what the probabilities are.

Caption:

• Ph->h is the probability of transitioning to the happy state from the happy state.
• Ps->h is the probability of transitioning to the happy state from the sad state.
• Ps->s is the probability of transitioning to the sad state from the sad state.
• Ph->s is the probability of transitioning to the sad state from the happy state.

But we might have a set of data of people’s mood over time as follows (h = happy; s = sad):

Person Starting state State 1 State 2 State 3 State 4
0 h s s s h
1 h h h h s
2 h s h h s
3 s h s s h
4 h s h h h
5 s h s s h
6 h s h h s
7 h s h s h
8 s h s h s
9 s s h h s
10 h s h h s
11 s s h s s
12 h s h h s
13 s s h s s
14 h h s h s
15 s h s s s
16 h h s h s
17 s h s h h
18 s s s s s
19 h s s s s
20 h s h s s

From this data, we can estimate each transition probability in the Markov chain.

Let’s start with person0. Person0 starts off happy and follows the sequence happy to sad to sad to sad to happy. The likelihood we observed this sequence for this person is:

$P_{h \rightarrow s} \times P_{s \rightarrow s} \times P_{s \rightarrow s} \times P_{s \rightarrow h}$

Next, for person1, whose sequence is happy to happy to happy to happy to sad, the likelihood of seeing their sequence of moods is:

$P_{h \rightarrow h} \times P_{h \rightarrow h} \times P_{h \rightarrow h} \times P_{h \rightarrow s}$

We can calculate the probability of each person’s sequence through the moods. Then, since each person’s mood is independent, we can multiply each person’s probability together to get the probability of seeing the entire data set:

$(P_{h\rightarrow h})^{11}\times(P_{h\rightarrow s})^{20}\times(P_{s\rightarrow h})^{19} \times(P_{s\rightarrow s})^{13}$

Using the Markov property, we know that

$(P_{h\rightarrow s} + P_{h\rightarrow h}) = 1$

and

$(P_{s\rightarrow h} + P_{s\rightarrow s}) = 1$

So we can simplify the equation to:

$(P_{h\rightarrow h})^{11}\times (1- P_{h\rightarrow h})^{20} (1-P_{s\rightarrow s})^{19} \times(P_{s\rightarrow s})^{13}$

Now, we can fit the parameters for Ph->h and Ps->s (and using the Markov property, Ph->s and Ps->h) to maximize the probability of seeing the entire dataset to find the Maximum Likelihood Estimate (MLE) of the transition probabilities. We constrain Ph->h and Ps->s to be between 0 and 1 and find that the MLE estimates of the parameters are Ph->h = .35 and Ps->s = .41, meaning the Markov Chain is as follows:

Compared to the original example, people are a lot moodier with these parameters. Both state transition probabilities are higher, so people change state more often. The probability for staying happy once happy is also much lower, so it’s less likely people stay happy for long periods of time. For this example, the data was generated using the parameters Ph->h = .3 and Ps->s = .4, so our MLE estimates are pretty close.

What did we look at in SSB’s subscription business?

For the customers in SSB’s subscription, we can represent their journey with a Markov chain with a node for each month. In each month of the subscription, a customer can transition to one of two states. A customer can either transition to the next month (with probability 1-p) or a customer can unsubscribe and transition to the unsubscribed state with probability p. Once unsubscribed, a customer stays unsubscribed, so the probability of staying in that state is 1.

This can be shown in the following Markov Chain:

What were the statistical assumptions?

This model makes several assumptions when translating SSB’s business strategy into a Markov chain model. We don’t see obvious violations of the assumptions in the data or problem setting.

First, it assumes that when calculating the probability of unsubscribing or staying subscribed, the product and month are the only two factors and that they are independent.

Second, it assumes that each state is in fact independent of the previous states (Markov assumption). This means that when a customer decides to cancel their subscription, it’s only because of the current month’s product so behavior like “the value of my products has decreased over time and therefore I’ve become unsatisfied” isn’t captured by the model.

Third, it assumes customers do not resubscribe, meaning the transition probability out of the unsubscribed state is 0.

Fourth, it assumes each customer has the same affinity for the different products they’re eligible to receive, so the probability Sally cancels after receiving product 1 is the same as the probability Jenny cancels after getting product 1. While this isn’t necessarily true in real life, we were trying to estimate which products did well or poorly in general. Since we have a large dataset of customers, individual customer preferences cancel out and the generalized population probabilities are found.

We chose to only model customers who had eventually unsubscribed.

During each month, a customer could unsubscribe because they were either disappointed by the product they received (Pproduct) or they could unsubscribe because they thought they had been in the subscription for long enough (Pmonth).

For a customer who received product 5 during month 1, the probability of transitioning from month 1 to 2 would be the probability they didn’t unsubscribe because of the month $(1-P_{month_{1}})$ multiplied by the probability they didn’t unsubscribe because of the product $(1-P_{product_{5}})$ meaning the probability they unsubscribed was $1-\left ((1-P_{month_{1}})\times(1-P_{product_{5}})\right )$ which is$P_{month_{1}}+P_{product_{5}} – P_{month_{1}}\times P_{product_{5}}$

So the Markov models for Sally, Jenny, and Lisa would look like:

For each customer, we can calculate the probability of observing their data.

For Sally, who received product 1, then product 2, then product 3, and then unsubscribed, the probability he transitions from month 1 to month 2 to month 3 to unsubscribe is:

Doesn’t cancel after month1 * doesn’t cancel after month2  * cancels after month3

$(1-P_{m_{1}})(1-P_{pr_{1}}) \times (1-P_{m_{2}})(1-P_{pr_{2}}) \times (P_{m_{3}}+P_{pr_{3}} -P_{m_{3}}\times P_{pr_{3}} )$

For Jenny, the probability of her subscription experience is:

$(1-P_{m_{1}})(1-P_{pr_{4}}) \times (P_{m_{2}}+P_{pr_{1}} -P_{m_{2}}\times P_{pr_{1}} )$

And for Lisa, it’s:

$(1-P_{m_{1}})(1-P_{pr_{5}}) \times (1-P_{m_{2}})(1-P_{pr_{2}}) \times(1-P_{m_{3}})(1-P_{pr_{4}}) \times \\ (1-P_{m_{4}})(1-P_{pr_{3}}) \times (P_{m_{5}}+P_{pr_{6}} -P_{m_{5}}\times P_{pr_{6}} )$

To get the overall probability of the customer journeys we see given the full dataset, we multiply the probability of each customer’s product journey together. We can then solve for the values of the different Pmonth and Pproduct to maximize the overall probability of the customer journeys we saw. This maximization finds the MLE for all of the coefficients. To make it easier to solve the giant list of equations, we take a log transformation of everything to convert each multiplication into an addition of the logarithm of the probability expressions.

What were the results?

For the maximum likelihood estimation of each coefficient, we found:

Months Probability of canceling (Pm)
Month 1 38%
Month 2 21%
Month 3 26%
Month 4 25%
Month 5 29%
Month 6 35%

For products, ranked best to worst:

Product Probability of canceling (Ppr) % of customers who received this product
Product 13 7% 34%
Product 10 11% 4%
Product 9 12% 15%
Product 3 15% 5%
Product 6 17% 51%
Product 2 17% 86%
Product 7 18% 6%
Product 14 20% 5%
Product 1 21% 34%
Product 12 25% 2%
Product 11 30% 29%
Product 4 31% 2%
Product 5 40% 4%
Product 15 63% 1%
Product 8 87% 2%

These results told us a few things.

First, not all products are equal. Products 13 and 10 are performing really well. Product 8 is causing a lot of unsubscriptions, so it might be valuable to remove that item from the subscription.

The month coefficients also show that the second month is the most important month to showcase a good product. The month coefficient is the lowest in month two, meaning cancellations this month are more likely to be because of a disliked product. We speculate that many customers who join the subscription are willing to try it out for the first month, but when the second month comes around, if they aren’t impressed by what they’re getting, they cancel. This means it’s most important to put a good product in the second month to minimize cancellations.

We checked the stability of the model by running it on the full dataset as well as on a random subset of customers in the dataset. Later in the project cycle, when more data was available because time had passed, we ran the model again with the larger data set. Each time, we found approximately the same coefficients, so we were pretty confident we weren’t overfitting or hitting numerical instability.

What’s the advantage of using a Markov chain analysis?

Basic analysis, like counting how many cancellations happened after each product was received, doesn’t allow us to separate the effects that both the products and time in the subscription had on cancellations. Taking the average cancellation rate per product wouldn’t be a true comparison between products. For example, if a large number of “good” products (ones people liked) were sent in a “bad” month (high cancelation) and a small number were sent in a “good” month (low cancelation), the cancelation rate for that product might look high. The “bad” month could get over-represented in that product’s cancelation rate, and even if it performed much better than other products in the “good” month, the average cancelation rate would be low. Since the Markov chain method separates the effect of the month from the product, we could account for uneven distributions of products during “good” and “bad” months.

Why did you only model customers who eventually unsubscribed?

Including customers who are still subscribed creates bias in the data. Subscribed customers don’t have cancellation information yet, so they are missing key information describing their behavior. Subscribed customers are like including only the first part of an unsubscribed customer’s journey. This means adding them paints an overly optimistic picture for the first few month coefficients. The optimization step is overwhelmed by too many terms showing that the month 1 to 2 and month 2 to 3 transitions were successful, so the month coefficients for the first few months go to 0 (0 probability of cancelling from an early month) and the other coefficients go to nonsensical values. If we wanted to include unsubscribed customers, we’d have to add a state in the Markov chain that accounted for the uncertainty around how long they would stay subscribed to SSB’s subscription.

Why was the MLE estimate different than the actual parameter values?

When creating the sample dataset, we used the real parameter values and simulated each transition. Because we were simulating probabilities, there wasn’t a guarantee that the number of transitions between each state was equal to the expected values. This means we saw more happy to happy transitions than expected, so our estimate was greater than the true value of the parameter.

Do you have an idea for an interesting data science project? Let’s hear it! Contact Klaviyo’s data science team at datascience@klaviyo.com.

Back to Blog Home