Using shape parameter vs using specifying it using a different model altogether

MayankSatnalika · July 20, 2018, 8:17pm

In the tutorial here http://docs.pymc.io/notebooks/getting_started, under Getting started Linear Regression, we have

basic_model = pm.Model()

with basic_model:

    # Priors for unknown model parameters
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10, shape=2)
    sigma = pm.HalfNormal('sigma', sd=1)

    # Expected value of outcome
    mu = alpha + beta[0]*X1 + beta[1]*X2

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal('Y_obs', mu=mu, sd=sigma, observed=Y)

We need alpha, beta ( which has 2 values b_0 and b_1, and sigma, now what is the difference between following the above and using something like


beta0 = pm.Normal('beta1', mu=0, sd=10 )
beta1 = pm.Normal('beta2', mu=0, sd=10 )

mu = alpha + betta0*X1 + beta1*X2

In the first case, the 2 values of beta also seem to come from completely different distributions as it seems from the trace-plot. What is the difference between the 2 approaches.

junpenglao · July 20, 2018, 9:15pm

There should be no differences between these two approaches.

SiobhanDK · August 28, 2018, 9:25pm

I would like to follow up this question to clarify something which has confused me. Does this mean when you specify a Distribution as having Shape 2, then two independent distributions are created with the same prior, which are then trained independently of each other.

I want to make sure that Shape doesn’t simply mean; draw two samples from the same distribution. If you wanted to draw multiple samples from the same distribution in a model what argument would you use? dim=2?

I ask as the documentation says that Shape is used to define the length or shape of the random variable, rather than initiating multiple random variables. I think this could be interpreted as a single random variable which is sampled multiple times.

Thanks for your time.

junpenglao · August 29, 2018, 4:42am

The shape issue is constantly a mess because we does not distinguish from eg event shape, batch shape, etc. There is more though in https://github.com/pymc-devs/pymc3/pull/2833 explaining what the ideal design should be.

As for you specific question, your understanding is correct (without nitpicking some of the terminology). You dont draw samples from distribution within the model block, as that’s done via the random method of a RandomVariable or a distribution. Intuitively, understand it as random variables that are constrained by some rules (i.e., prior distribution), but not random generation from said rules.

SiobhanDK · August 29, 2018, 7:05pm

Thanks for responding so soon. According to the terminology in the link, am I right then in equating Shape, as it performs currently, with with the behavior of param_shape.

Also is there an distribution argument which would be the equivalent of the atom_shape proposed in the link, which we could use now.

When you say Shape is a mess, does it perform differently depending on the use case. Are there cases when Shape would result in different behavior than that shown in the Model example above.

junpenglao · August 29, 2018, 8:31pm

Nope, currently you need to figure out the shape (param_shape + atom_shape) and input them correctly into your model. This is what I meant mess because there are quite some edge cases and bugs.

Topic		Replies	Views
Shape parameter is giving different results when compared to multiple variables Questions	0	440	February 20, 2020
pm.Normal, what is the `shape` parameter for a normal distribution? Questions	4	7885	November 6, 2020
Can you specify different parameters for a vector of priors? Questions	9	1170	April 6, 2022
Confusion on the use of shape parameter in multinomial likelihood v3 modeling	4	388	July 21, 2023
Understanding `shape` keyword for `pm.MvNormal` distribution	1	367	August 30, 2023

Using shape parameter vs using specifying it using a different model altogether

Related topics