Testing difference between two Negative Binomial distributions

Hi, I am completely new to Bayesian statistics and pymc3 and have run into trouble.
I am trying to measure the change in the number of deaths in hospitals before and after the intervention. I decided to mostly follow BEST procedure (Bayesian Estimation Supersedes the T-Test — PyMC3 3.10.0 documentation) to evaluate the effect. However, I am not sure if my model is correct.

My questions are:

  • is my model adequate to the situation?
  • can we use the difference of means in this case? (the resulting posterior distributions don’t look normal to me)
  • what are other ways to evaluate the pre and post difference?

Here is my code:

import pymc3 as pm

pre_mu = pre.mean()
pre_sd = pre.std() * 2
post_mu = post.mean()
post_sd = post.std() * 2
σ_low = 0
σ_high = 10

with pm.Model() as model:
    
    pre_mean = pm.Normal("pre_mean", mu=pre_mu, sd=pre_sd)
    post_mean = pm.Normal("post_mean", mu=post_mu, sd=post_sd)
    
    pre_std = pm.Uniform("pre_std", lower=σ_low, upper=σ_high)
    post_std = pm.Uniform("post_std", lower=σ_low, upper=σ_high)
    
    diff_of_means = pm.Deterministic("difference of means", pre_mean - post_mean)
    diff_of_stds = pm.Deterministic("difference of stds", pre_std - post_std)
    effect_size = pm.Deterministic(
        "effect size", diff_of_means / np.sqrt((pre_std ** 2 + post_std ** 2) / 2)
    )
    

    group1 = pm.NegativeBinomial("pre", mu=pre_mu, alpha=pre_sd, observed=pre_nzk)
    group2 = pm.NegativeBinomial("post", mu=post_mu, alpha=post_std, observed=post_nzk)
    
    trace = pm.sample(5000)

I think most of your questions would be best answered knowing what form you data is in. Does your data consist of pre-post pairs? Or do you have a pile of pre observations and a separate pile of post observations? You mention the empirical distribution not looking well-behaved. What does it actually look like?

1 Like

Thanks for your response!

  1. No, they are not pairs, but two observational periods (pre and post), where the number of lethal cases/day was recorded as a single number.
  2. Attaching trace plot.

Then I would say what you have is at least reasonable. Two quick comments, however. First, uniform priors are often not a great idea because the discontinuities at each end of the interval give the sampler a hard time. Sampling is often easier if you use something like a gamma prior to “encourage” the sampler to stay >=0. Second, you have a large number of divergences (the tick marks along the x-axes) that should discourage you from interpreting the results you are currently getting. I suspect that these two points are related, but the divergences could arise from other sources as well.

1 Like

I reparametrized using Lognormal instead of Binomial, as I converted raw lethal cases to lethal cases per hospitalized patients. However now I receive SamplingError and I can’t figure out what’s wrong

import pymc3 as pm

pre_mu = pre_nzk.mean()
pre_sd = pre_nzk.std() * 2
post_mu = post_nzk.mean()
post_sd = post_nzk.std() * 2


alpha = 1
beta = 5

with pm.Model() as model:
    
    pre_rrt_mean = pm.Normal("pre_mean", mu=pre_mu, sigma=pre_sd)
    post_rrt_mean = pm.Normal("post_mean", mu=post_mu, sigma=post_sd)
    
    pre_std = pm.Gamma("pre_std", alpha=alpha, beta=beta)
    post_std = pm.Gamma("post_std", alpha=alpha, beta=beta)
    
    diff_of_means = pm.Deterministic("difference of means", pre_mean - post_mean)
    diff_of_stds = pm.Deterministic("difference of stds", pre_std - post_std)
    effect_size = pm.Deterministic(
        "effect size", diff_of_means / np.sqrt((pre_std ** 2 + post_std ** 2) / 2)
    )
    

    group1 = pm.Lognormal("pre", mu=pre_mean, sigma=pre_std, observed=pre_nzk)
    group2 = pm.Lognormal("post", mu=post_mean, sigma=post_std, observed=post_nzk)
    
    trace = pm.sample(5000, target_accept=0.99)

And the error is:

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'pre_mean': array(3.67540805), 'post_mean': array(2.31661707), 'pre_std_log__': array(-1.60943791), 'post_std_log__': array(-1.60943791)}

Initial evaluation results:
pre_mean           -2.58
post_mean          -1.99
pre_std_log__      -1.00
post_std_log__     -1.00
pre                  NaN
post             -144.00
Name: Log-probability of test_point, dtype: float64

I had to make some guesses about what your data looks like so that I could run your model. But it seems to sample ok for me. So maybe an awkward match of priors and your data?

Sorry for my ignorance here, but why wouldn’t you want to test the difference in means, etc., using the sampled group1 and group2 negative binomial parameters (e.g. mu)? I would like to use BEST for a negative binomial model but would like to confirm how to apply it. Thanks!

1 Like

This appears to have been the rationale for moving away from the negative binomial.

The original model is basically a BEST model (with negative biniomials instead of t distributions), though it omits all the decision theoretic procedures (defining ROPEs, comparing HDIs to ROPEs, power analyses, etc.) that Kruschke layers on top of the actual estimation/inference.

In the first model, it uses the difference in means and std that are fit to Normal and Uniform distributions, respectively, not the mu and alpha parameters from the Negative Binomial. My question is wouldn’t you want to test the differences after fitting the NB model conditioned on the data? The group1 and group1 NB are unconnected to the BEST. Apologies if I’m misreading / misinterpreting this since I’m new to this application. Thanks.

1 Like

I’m not sure what you mean by that. Each NB is directly tied to the data (e.g., observed=pre_nzk and observed=prost_nzk) and each NB has its mean and SD parameterized (e.g., pre_mean, pre_std, etc.) and those parameter estimates are then compared in the deterministic diff_of_means, diff_of_stds, and effect_size.

@cluhmann I am also confused about this application. I think what @dv_mepi is saying is: shouldn’t we compare the posterior mean of the difference of group2 posteriors distribution and group1s posterior distribution. I don’t see how the current deterministic “diff of means” is connected to the data. I do recognize that the priors “pre_mean” and “post_mean” are informed by “pre_mu”, “pre_sd”, “post_mu”, and “post_sd”, but it is not clear that these estimates are informed by the observations in the data. Does the code in the with block not run sequentially top to bottom?

1 Like

Huh. I have no recollection of the above thread, so this is based on a super quick look at what is above. But there may be a typo in the original code (or the follow-up). There is both pre_mu and pre_mean running around, which is always a bit confusing (to me). The scalar pre_mu is used in the likelihood, but post_mu` is not. I suspect that they should be switched inside the likelihood:

    group1 = pm.NegativeBinomial("pre",
                                 mu=pre_mean,
                                 alpha=pre_sd,
                                 observed=pre_nzk
                                 )
    group2 = pm.NegativeBinomial("post",
                                 mu=post_mean,
                                 alpha=post_std,
                                 observed=post_nzk
                                 )

Otherwise, pre_mean and post_mean don’t actually do anything. Does that make sense? If not, I can have a closer look.

Also, if that is correct, I can put together a full NegativeBinomial mean-testing bit of model code just for posterity’s sake.

Yes, that is what I also believe and the OP corrects the error when changing the likelihood from NGB to lognormal. However, the question is should the OP not have compared the difference in group1 vs group2 by taking the difference in of the posteriors of group2 and group1. Instead of looking at the Deterministic “diff_of_means” variable, that does not seem to be conditioned on the actual observations of the data?

If pre_mean and post_mean are plugged into the likelihood, then the deterministic diff_of_means will be equivalent to post-processing the trace and calculating the distribution of the difference.

Sort but not really? The model code is used to construct a computational graph, but no computations are actually performed as the lines of python inside the model context are executed. The computations are performed (by pushing data through the computational graph) when you call pm.sample() or pm.prior_predictive(), etc.

So as (I think) I said above, pre_mean and post_mean are (or should be) connected to the data in those 2 likelihoods. These lines:

    pre_mean = pm.Normal("pre_mean", mu=pre_mu, sd=pre_sd)
    post_mean = pm.Normal("post_mean", mu=post_mu, sd=post_sd)

specify the prior for these parameters. The posterior is in the trace.

Apologies if I am either not answer your question or rehashing things from above. Like I said, this thread is long forgotten.

This is exactly answering my question. So even through the pm.Deterministic comes before the pm.NegativeBinomial in the python code under-the-hood it will be conditioned on the observations within the data. Thank you very much.

1 Like

Exactly. pm.Deterministic is used to save “intermediate” calculations into the trace. Sometimes you need to do stuff like this:

with pm.Model() as model:
    theta_raw = pm.Normal("theta_raw", 0, 1)
    like = pm.Binomial("like", pm.math.exp(theta_raw), n= 10, observed = [2,5,8]

But then you lose the transformed version of theta_raw. You can just apply np.exp() to the samples in the posterior, but maybe it makes more sense to save the transformed values in the trace:

with pm.Model() as model:
    theta_raw = pm.Normal("theta_raw", 0, 1)
    theta = pm.Deterministic("theta", pm.math.exp(theta_raw))
    like = pm.Binomial("like", theta, n= 10, observed = [2,5,8]
1 Like

Thank you, this is very helpful!

1 Like