Clarification Required for Frequency-Recency Matrix Interpretation

Alireza_Ghaffari · May 20, 2024, 6:53am

Hi there,

I have a problem interpreting the frequency-recency chart that shows the probability of returning customers by color using matplotlib.pyplot.imshow. This chart is the output of the clv.plot_probability_alive_matrix method, and it seems counterintuitive to me.

Shouldn’t the probability increase when we move along the x-axis for a fixed recency (e.g., 500) all the way to the right? Because customers with higher frequency are supposed to be more likely to return, aren’t they? How do you interpret it?

ricardoV94 · May 20, 2024, 9:05pm

It may be that it’s very unlikely for customers with high frequency to have such a recency gap, that’s more likely they have churned?

Alireza_Ghaffari · May 20, 2024, 10:22pm

I’m actually mixed up. Could you please write a statement explaining and interpreting customers with a recency value of 500?

Is recency the age of a customer in units of time since their last purchase? If so, shouldn’t customers with more purchases be more engaged? Why are customers with fewer than 100 purchases more likely to return?

ricardoV94 · May 21, 2024, 7:19am

Recency is the time starting at t0 of the last purchase. Bigger numbers mean more recent purchases IIRC

ricardoV94 · May 21, 2024, 7:22am

Unless there’s an error, the explanation is that customers with very high frequency were probably buying every time period. If they go now 10-20 (arbitrary number) without buying, this is so different from their usual purchase behavior that it’s more likely they dropped out, all else being equal.

If your grandmother goes out to buy bread every day for over a decade and one day she doesn’t show up, the neighbors may call you to check if she is alright. If it’s a business guy that’s out of town half of the time, nobody bats an eye when he doesn’t show up in the bakery for a month. The more reliable you are, the more strange it is that you didn’t show up recently (according to this model).

Having said that, I would need to double check the plot code to make sure there is no obvious error. But this is the explanation I would give to the pattern.

ricardoV94 · May 21, 2024, 7:24am

The pattern in the liftetimes guide is similar: Quickstart — lifetimes 0.11.2 documentation

Except because the number are smaller you notice a more gradual transition from high to low probability of being alive.

ricardoV94 · May 21, 2024, 7:26am

Recency of 500 it’s relative to a big T, that may be something like 600, the number of time periods after which you stopped collecting the data. That means the last purchase was 100 units of time ago. A recency of 50, means the last purchase was 550 units of time ago. Bigger is more recent.

It would probably be more intuitive if recency was distance away from zero, with zero being the current time, but this is how the models were described in the literature

ColtAllen · May 21, 2024, 4:01pm

Recency is actually the number of time periods between a customer’s first and last purchase.

A fundamental limitation of this plot is that it obscures the impact of the T variable (i.e., the total observed number of time periods since the customer’s first purchase). A customer with recency=500 and T=510 made their most recent purchase only 10 time periods so, so they are far more likely to be alive than a customer with recency=500 and T=1000. The plotting code sets T==max(recency) for all customers, which biases this chart towards the high recency customers. For this reason, I have plans to modify this plot to include the T variable, which would also make it 3-dimensional.

Alireza_Ghaffari · May 21, 2024, 6:09pm

Based on the answers, can we have the chart’s y-axis labeled as “Current Time(max date) - Last Purchase Date” instead of “Recency” with this meaning? This definition will give us a better intuitive understanding.

Topic		Replies	Views
Discrete distribution question Questions	13	1101	August 24, 2019
Idea for colors prediction model Questions	2	544	July 19, 2019
Best practice to avoid Sampling Error when fitting a model on data with recency = 0? version agnostic	3	44	April 19, 2025
Are my probability distributions good? Questions	4	504	March 8, 2021
Understanding modeling when missing categorical values are present v5	1	241	April 16, 2023

Clarification Required for Frequency-Recency Matrix Interpretation

Related topics