Reproducing an example from PyCon 2017

I’m trying to do by myself the example of Chris Fonnesbeck in Pycon 2017: Christopher Fonnesbeck - Introduction to Statistical Modeling with Python - PyCon 2017 - YouTube, from 2:06:00. (I have my own data).

My code (some differences with Fonnesbeck’s code):

with pm.Model() as my_model:
    
    a_first = pm.Uniform("a_first", 0, 10)
    a_second = pm.Uniform("a_second", 0, 10)
    
    b_first = pm.Uniform("b_first", 0, 5)
    b_second = pm.Uniform("b_second", 0, 5)
    
    # likelihood
    first_weeks = pm.Gamma("first_weeks", alpha=a_first, beta=b_first, observed=first_weeks_arr)
    second_weeks = pm.Gamma("second_weeks", alpha=a_second, beta=b_second, observed=second_weeks_arr)
    
    # Gamma's mean: α/β
    d = pm.Deterministic("d", a_first/b_first - a_second/b_second)
    
    # Fit
    #samples = pm.fit(20000, method="svgd").sample(1000) # I won't use VI
    samples = pm.sample(20000)

with my_model:
    print(az.summary(samples))
    az.plot_posterior(samples, var_names=["d"], ref_val=0)
    az.plot_trace(samples)

imagen

with my_model:
    pred = pm.sample_posterior_predictive(samples, var_names=["d"])
    az.plot_posterior(pred, var_names="d", ref_val=0)

imagen

My 2 questions:

  1. I think the mean of the deterministic variable “d” does not answer the question of when the first_weeks are above the second_weeks, right? In these versions of PyMC3 it would be the percentages above 0, 61.5% of the times first_weeks > second_weeks. Am I right here?

  2. Now I want to do what Chris does from 2:16:00 onwards, that is, instead of asking of d using the historical data, what is the probability for a specific point of the future, I think this is sample_posterior_predictive, but Chris does it differently, perhaps sample_posterior_predictive didn’t exist in 2017. Would be the last part of my code valid? I obtain the same image, so I think there’s something I am missing…

Thx.

Ok, I think I got it. For the second question (the first, I am 90% confident it’s ok), would be something like:

with my_model:    
    # To find difference of the posteriors we find the mean of each
    # num_chains*num_samples_per_chain and then subtract them
    mean_1st_weeks = np.mean(pred["first_weeks"], axis=1)
    print(mean_1st_weeks.shape)
    mean_2nd_weeks = np.mean(pred["second_weeks"], axis=1)
    print(mean_2nd_weeks.shape)
    
    # Now:
    diff = mean_1st_weeks - mean_2nd_weeks
    print(diff.shape)
    print(diff)

    az.plot_posterior(diff, ref_val=0)

imagen
With an increase of the 94% HDI width, as one would expect from a posterior sample. Now I think this second question is ok with 90% of confidence, too.