Hello,

I’m doing linear regression and I’m wondering which is the correct / better way of observing independent variable (weight in my case, the dependent variable is height).

One way is to model independent variable as a (normal) distribution and conditioning on observed data:

m = pm.Model()

with m:

```
alpha = pm.Uniform('alpha', lower=0, upper=300)
beta = pm.Normal('beta', mu=0, sd=50)
sigma_height = pm.Uniform('sigma_height', lower=0, upper=50)
weight = pm.Normal('weight', mu=np.mean(d2['weight']), sd=np.std(d2['weight']), observed=d2['weight'])
mu_height = alpha + beta * (weight-np.mean(d2['weight']))
height = pm.Normal('height', mu=mu_height, sd=sigma_height, observed=d2['height'])
```

But I discovered that I can simply add weight directly from dataframe (without first modeling it as a normal distribution)

m = pm.Model()

with m:

```
alpha = pm.Uniform('alpha', lower=0, upper=300)
beta = pm.Normal('beta', mu=0, sd=50)
sigma_height = pm.Uniform('sigma_height', lower=0, upper=50)
mu_height = alpha + beta * (d2['weight']-np.mean(d2['weight']))
height = pm.Normal('height', mu=mu_height, sd=sigma_height, observed=d2['height'])
```

My questions are:

- How are these two models treated by PyMC3 during MCMC inference? What’s the difference? (Specifically for the second one, since dataframe (d2[‘weight’]) has a list of values, how is mu_height calculated (since it is scalar)? During sampling, Is dataframe also sampled (uniformly?) to arrive at mu_height?
- Which one should I be using for regression? Are there any pros and cons of each approach?

By the way, both of the approaches give similar estimations for unknowns (alpha, beta, sigma_height).