Linear Regression - Difference between these two models?

Paras_Chopra · January 7, 2019, 12:09pm

Hello,

I’m doing linear regression and I’m wondering which is the correct / better way of observing independent variable (weight in my case, the dependent variable is height).

One way is to model independent variable as a (normal) distribution and conditioning on observed data:

m = pm.Model()

with m:

alpha = pm.Uniform('alpha', lower=0, upper=300)
beta = pm.Normal('beta', mu=0, sd=50)
sigma_height = pm.Uniform('sigma_height', lower=0, upper=50)

weight = pm.Normal('weight', mu=np.mean(d2['weight']), sd=np.std(d2['weight']), observed=d2['weight'])
mu_height = alpha + beta * (weight-np.mean(d2['weight']))

height = pm.Normal('height', mu=mu_height, sd=sigma_height, observed=d2['height'])

But I discovered that I can simply add weight directly from dataframe (without first modeling it as a normal distribution)

m = pm.Model()

with m:

alpha = pm.Uniform('alpha', lower=0, upper=300)
beta = pm.Normal('beta', mu=0, sd=50)
sigma_height = pm.Uniform('sigma_height', lower=0, upper=50)

mu_height = alpha + beta * (d2['weight']-np.mean(d2['weight']))

height = pm.Normal('height', mu=mu_height, sd=sigma_height, observed=d2['height'])

My questions are:

How are these two models treated by PyMC3 during MCMC inference? What’s the difference? (Specifically for the second one, since dataframe (d2[‘weight’]) has a list of values, how is mu_height calculated (since it is scalar)? During sampling, Is dataframe also sampled (uniformly?) to arrive at mu_height?
Which one should I be using for regression? Are there any pros and cons of each approach?

By the way, both of the approaches give similar estimations for unknowns (alpha, beta, sigma_height).

junpenglao · January 7, 2019, 3:39pm

The two model will be treated the same in MCMC, because

weight = pm.Normal('weight', mu=np.mean(d2['weight']), sd=np.std(d2['weight']), observed=d2['weight'])

Adds a constant to the model logp, thus has not effect during MCMC. Moreover, weight will be a tensor with value set to d2['weight'].

You should use the first one, because the second one does not add any new information and could be confusing. But if you have uncertainty in the covariates you want to model you can also do:

mu_weight = pm.Normal(...)
sd_weight = pm.HalfNormal(...)
weight = pm.Normal('weight', mu=mu_weight, sd=sd_weight, observed=d2['weight'])

Paras_Chopra · January 7, 2019, 3:51pm

Thanks. A quick follow up question on your comment:

Moreover, weight will be a tensor with value set to d2['weight'] .

Since, mu_height is determined by:

mu_height = alpha + beta * (d2[‘weight’]-np.mean(d2[‘weight’]))

I’m assuming mu_height will also be a tensor (not tensor) during sampling? Is my understanding correct?

In that case, what does mu=mu_height imply in:

height = pm.Normal(‘height’, mu=mu_height, sd=sigma_height, observed=d2[‘height’])

My assumption was that mu took a scalar value. How is mu_height tensor treated? (Also sigma_height isn’t a tensor and I’m not sure why that won’t cause a problem)

junpenglao · January 8, 2019, 2:05am

Yes

It will be a vector tensor with the shape same as the observed. Its value is propagated from other free parameters, so it is essentially invisible to the sampler (as it is not a free parameter).

It will be convert into a tensor constant during run time

Paras_Chopra · January 8, 2019, 6:13am

Thanks.

So this means if mu_height is of shape (100) containing 100 different data points, sigma_height will also be recasted as shape of (100) with the values replicated 100 times?

junpenglao · January 8, 2019, 7:50am

It will be broadcasted during the computation of logp. But will not be a shape (100) tensor the same sense as mu_height.

Topic		Replies	Views
Modeling of multiple regression model with array form v5 development , linear_model , shape_issue , modeling	12	1222	December 17, 2024
Model Fitting using datasets of different observation instruments Questions	11	2544	April 15, 2020
Create model matrix Questions	7	1490	June 23, 2021
Basic linear regression on fuel efficiency of cars/trucks. Trouble with dims v5 linear_model , shape_issue , modeling	1	463	April 22, 2023
Prior predicitive for linear regression Questions	2	527	May 22, 2019

Linear Regression - Difference between these two models?

Related topics