# Access probability p parameter from pm.BetaBinomial distribution during sample_ppc

Hello,

I was wondering if there was anyway to access the probability parameter for the binomial distribution in the sample_ppc call. My model is constructed so that alpha and beta are deterministic values derived from certain distributions.

I am most interested in the p parameter of the binomial distribution, but setting the model up as a beta distribution that feeds into the binomial distribution results in much, much slower sampling.

So, to that end, if I could get help either getting the p parameter from the BetaBinomial distribution during sample_ppc or fixing the sampling issues with the theoretically same model, it would be very much appreciated.

Here is the model for reference:

``````with pm.Model() as model:
intercept = pm.Normal('intercept', mu=0.0, sd=5, shape=1)

for covariate in covariates:
if covariates[covariate]['type'] == 'categorical':
shape = covariates[covariate]['encoder'].classes_.size
elif covariates[covariate]['type'] in ['metric', 'binary']:
shape = 1

sigma = pm.HalfCauchy(f'{covariate}_coeff_sigma', beta=5)
offset = pm.Normal(f'{covariate}_coeff_offset', mu=0, sd=1, shape=shape)
coeff = pm.Deterministic(f'{covariate}_coeff', 0.0 + offset * sigma)

if shape > 1:
else:

kappa = pm.Exponential('kappa', lam=1e-4)

alpha = pm.Deterministic('alpha', omega * kappa + 1)
beta = pm.Deterministic('beta', (1 - omega) * kappa + 1)

p = pm.Beta('p', alpha=alpha, beta=beta, shape=p_shape)

likelihood = pm.Binomial(
'likelihood', p=p, n=model_variables['n'], observed=model_variables['y_obs']
)
# The likelihood parameterization below results in much faster sampling, but does not let me draw p parameter from posterior distribution
# likelihood = pm.BetaBinomial('likelihood', alpha=alpha, beta=beta, n=model_variables['n'], observed=model_variables['y_obs'])
``````

Thanks!

1 Like

I dont think it is possible to get the `p` from the `BetaBinomial` distribution easily. You could write a forward random function getting the sample from the trace tho:

``````# after you do sampling
post_alpha = trace['alpha']
post_beta = trace['beta']
nsample = 1000
index = np.random.randint(0, max(post_alpha.shape), nsample)
post_p = np.random.beta(post_alpha[index], post_beta[index], size=(p_shape, nsample))
``````
1 Like

How would this work with switching the model_variables for the covariates from train data to test data though?

I was over complicating the problem. Actually, you can try this:
after you sample the model, create the RV `p` and sample it:

``````# after sampling with the model using likelihood = pm.BetaBinomial(...
with model:
p = pm.Beta('p', alpha=alpha, beta=beta, shape=p_shape)
ppc = pm.sample(trace, 1000, vars=[p])
``````
1 Like

This sounds like exactly what I wanted. Thank you!

Out of curiosity, is it expected for the Beta and Binomial model to sample at completely different speeds?

Yes, the marginalization makes it much faster - which is why it was written as a separate distribution.

1 Like

Makes perfect sense and what I was thinking. Thank you very much for your help.

Hmm, the code you provided is actually failing. It’s saying alpha does not exist. When I use model.alpha I get the following TypeError

TypeError: ‘<’ not supported between instances of ‘MultiTrace’ and ‘int’

Did you overwrite the `alpha` at some point?
If so, you can access to the alpha and beta again by doing:

``````model.deterministics
``````

say the output is `[alpha, beta]`, you can then do

``````alpha = model.deterministics
beta = model.deterministics
``````

Yeah, that was an error on my part.

Using your suggestion, the failure is now that the model can’t access the p_log_odds__ transformation of the beta distribution

Here is the code:

``````# Access number of observations from shared tensor model_variables['y_obs']
p_shape = model_variables['y_obs'].container.storage.size

model = construct_model(model_variables, model_covariates)

with model:
parameters = {
str(text): text for text in model.deterministics if str(text) in ['alpha', 'beta']
}
p = pm.Beta('p', alpha=parameters['alpha'], beta=parameters['beta'], shape=p_shape)

posterior = pm.sample(draws=1000, trace=trace, vars=[p])``````

I am not sure exactly how you are building your model, it seems you are doing something a bit complex that makes the theano graph detached an input.
Here is a minimalistic example that works:

``````p_shape = 10
with pm.Model() as model:
alpha = pm.HalfNormal('alpha', 5.)
beta = pm.HalfNormal('beta', 5.)
y = pm.BetaBinomial('obs', n=10, alpha=alpha, beta=beta, observed=np.random.binomial(10, .7, size=p_shape))
trace = pm.sample()

with model:
p = pm.Beta('p', alpha, beta, shape=p_shape)
ppc = pm.sample_ppc(trace, 1000, vars=[p])
``````

I’m basically dynamically creating a linear equation from the covariates that gets fed into the inverse logistic function that’s used as the mode of the Beta distribution that I’m trying to sample:

``````# Takes as input covariates which is a dictionary of covariate names and sklearn.encoders
# and model_variables which contains same keys but values as theano shared tensors.
with pm.Model() as model:
intercept = pm.Normal('intercept', mu=0.0, sd=5, shape=1)

for covariate in covariates:
if covariates[covariate]['type'] == 'categorical':
shape = covariates[covariate]['encoder'].classes_.size
elif covariates[covariate]['type'] in ['metric', 'binary']:
shape = 1

sigma = pm.HalfCauchy(f'{covariate}_coeff_sigma', beta=5)
offset = pm.Normal(f'{covariate}_coeff_offset', mu=0, sd=1, shape=shape)
coeff = pm.Deterministic(f'{covariate}_coeff', 0.0 + offset * sigma)

if shape > 1:
else:

kappa = pm.Exponential('kappa', lam=1e-4)

alpha = pm.Deterministic('alpha', omega * kappa + 1)
beta = pm.Deterministic('beta', (1 - omega) * kappa + 1)

likelihood = pm.BetaBinomial(
'likelihood', alpha=alpha, beta=beta, n=model_variables['n'], observed=model_variables['y_obs']
)

# Rescale coefficients to be deflections from baseline

for covariate in covariates:
if covariates[covariate]['type'] == 'categorical':
elif covariates[covariate]['type'] == 'metric':
pass

return model``````

Try `return model, alpha, beta`. But I think there are more efficient way to build the `covariate` instead of a for-loop and `link_argument += ...`

Also, a lazy quick fix is to build an exact same model again without the observed likelihood but with `p`, and sample ppc using the second model with the trace from the first model.

Actually, I don’t think I need to rebuild the model. I just added the parameter and sampled from ppc as so:

``````with model:
p = pm.Beta('p', alpha=model.named_vars['alpha'], beta=model.named_vars['beta'], shape=p_shape)

posterior = pm.sample_ppc(trace, samples=500, vars=[model.p])``````
2 Likes

Actually, I think I wasted your time. I shouldn’t have done sample, but rather sample_ppc from the beginning.

My apologies.

LOL, no worries. I totally didnt see it as well 1 Like