Gamma-Gamma Spend Model: differences in likelihood vs distribution fit

Hi there, first off I’d just like to thank y’all for the awesome work you have put in both to PYMC and the extensive guides and documentation!

I am working to build a Gamma-Gamma model of customer spend. I can fit a model using the pm.Potential method similar to the PYMC Marketing method. But I’m curious why a model that is specified as a series of distributions and hyper-parameters doesn’t converge to the same results?

Here is my distribution model:

with pm.Model(coords=coord_gamma) as gamma_model_simple:
    q = pm.Uniform('q', lower = 1, upper = 10)
    y = pm.HalfNormal('y', sigma=3)
    v = pm.Gamma('v', alpha=q, beta=y)
    
    p = pm.HalfNormal('p', sigma=5)
    
    z = pm.Gamma('z', alpha=p*gamma_sample_df['num_orders'],
                 beta=v*gamma_sample_df['num_orders'], 
                 observed=gamma_sample_df['aov'])        
    
    gamma_model_simple_trace = pm.sample()

Where gamma_sample_df is my user-level observation data of average order value and the num_orders for each customer.

When I run this the model fits with no issues but the parameter estimates for p, q, and y are way off from the pm.Potential model:

with pm.Model() as model:
    p = pm.HalfNormal(name="p", sigma=10)
    q = pm.HalfNormal(name="q", sigma=10)
    v = pm.HalfNormal(name="v", sigma=10)

    def logp(x, m):
        return (
            pt.gammaln(p * x + q)
            - pt.gammaln(p * x)
            - pt.gammaln(q)
            + q * pt.log(v)
            + (p * x - 1) * pt.log(m)
            + (p * x) * pt.log(x)
            - (p * x + q) * pt.log(x * m + v)
        )

    likelihood = pm.Potential(name="likelihood", var=logp(x, m))

Unfortunately I can’t share any data, but I was hoping to get some general advice on why the approaches would differ? My ultimate goal is to fit a multilevel model and add customer-specific features to improve the estimates and I’m assuming that would be easier with the fully specific model.

Thanks in advance!

q and y are going to be pretty much the same as the prior as they only inform a single variable.

You may be interested in this dev notebook on the gamma-gamma. I also tried a direct model on summary stats which failed, in that case because it ends up with too many variables per observation: pymc-marketing/docs/source/notebooks/clv/dev/gamma_gamma_dev.ipynb at main · pymc-labs/pymc-marketing · GitHub

I could only model directly (without marginalizing via the potential) when using the individual observations instead of the summary stats.

I was trying something different then what you’re doing.

Perhaps @ColtAllen knows something regarding customer specific effects,

Hey @statmills ,

Your results differ because the first model estimates a spend distribution for an individual customer, whereas the pm.Potential model is for an entire population. The likelihood expression is derived with respect to the population expectation.

The Gamma-Gamma likelihood can be modified to support covariates for individual customers, which is something we want to see added to pymc-marketing in the coming months. Here’s the research note if you’re curious:

https://www.brucehardie.com/notes/045/

2 Likes

Thank you both @ColtAllen and @ricardoV94 for the quick responses. I think I understand the difference between the models so thanks for sharing the reference links. I think that I have enough to move forward with the likelihood model with a better understanding of how it works. Thanks!