Hi there,
I have a problem interpreting the frequency-recency chart that shows the probability of returning customers by color using matplotlib.pyplot.imshow. This chart is the output of the clv.plot_probability_alive_matrix method, and it seems counterintuitive to me.
Shouldn’t the probability increase when we move along the x-axis for a fixed recency (e.g., 500) all the way to the right? Because customers with higher frequency are supposed to be more likely to return, aren’t they? How do you interpret it?
It may be that it’s very unlikely for customers with high frequency to have such a recency gap, that’s more likely they have churned?
I’m actually mixed up. Could you please write a statement explaining and interpreting customers with a recency value of 500?
Is recency the age of a customer in units of time since their last purchase? If so, shouldn’t customers with more purchases be more engaged? Why are customers with fewer than 100 purchases more likely to return?
Recency is the time starting at t0 of the last purchase. Bigger numbers mean more recent purchases IIRC
Unless there’s an error, the explanation is that customers with very high frequency were probably buying every time period. If they go now 10-20 (arbitrary number) without buying, this is so different from their usual purchase behavior that it’s more likely they dropped out, all else being equal.
If your grandmother goes out to buy bread every day for over a decade and one day she doesn’t show up, the neighbors may call you to check if she is alright. If it’s a business guy that’s out of town half of the time, nobody bats an eye when he doesn’t show up in the bakery for a month. The more reliable you are, the more strange it is that you didn’t show up recently (according to this model).
Having said that, I would need to double check the plot code to make sure there is no obvious error. But this is the explanation I would give to the pattern.
The pattern in the liftetimes guide is similar: Quickstart — lifetimes 0.11.2 documentation
Except because the number are smaller you notice a more gradual transition from high to low probability of being alive.
Recency of 500 it’s relative to a big T, that may be something like 600, the number of time periods after which you stopped collecting the data. That means the last purchase was 100 units of time ago. A recency of 50, means the last purchase was 550 units of time ago. Bigger is more recent.
It would probably be more intuitive if recency was distance away from zero, with zero being the current time, but this is how the models were described in the literature
1 Like
Recency is actually the number of time periods between a customer’s first and last purchase.
A fundamental limitation of this plot is that it obscures the impact of the T
variable (i.e., the total observed number of time periods since the customer’s first purchase). A customer with recency=500
and T=510
made their most recent purchase only 10 time periods so, so they are far more likely to be alive than a customer with recency=500
and T=1000
. The plotting code sets T==max(recency)
for all customers, which biases this chart towards the high recency customers. For this reason, I have plans to modify this plot to include the T
variable, which would also make it 3-dimensional.
1 Like
Based on the answers, can we have the chart’s y-axis labeled as “Current Time(max date) - Last Purchase Date” instead of “Recency” with this meaning? This definition will give us a better intuitive understanding.