Beginner Question / Linear Regression + Causal DAG

Hi all, I’ve been working through Statistical Rethinking and have been doing the corresponding problems. My initial belief is that age, when young, will have a significant impact on height which will correspondingly have an impact on weight and I further thought about it and remembered that at times old people “shrink” due to bad posture and other causes. The first thought made me think of an exponential prior but now I’m leaning towards somewhat Gaussian since I couldn’t wrap my head around how the exponential prior would work out practically (although the “shrinking” aspect of things makes me wonder if it’s possible to make something somewhat sinusoidal with the negative half not being as pronounced as the positive, since people tend to “thin” out as they age as they lose volume in their face (less collagen etc) and presumably other places).

Some of my confusions lie around how I should handle the height variable now as I want to capture the effect of age but now need to do it both directly and indirectly I suppose. Also I wonder if using \beta2 in \mu_height makes sense because intuitively I think that’s probably incorrect. The code I have runs and doesn’t crash but I’m not great at diagnosing models yet and making sure they make sense and largely feel like it’s probably incorrect.

Attached are 3 images, in order: question prompt, attempted answer to prompt which is probably a mess and the previous answer the model is based on (which I consulted bayesiancomputationbook for). To whoever reads, thank you!

So I’d use pm.model_to_graphviz to help diagnose any structural issues in how you’ve composed the model.

What you are saying about the relation between age and height and age and weight… at the moment you are assuming a linear relationship. No reason why you can’t extend that to a quadratic. But this is about modelling choices. So the approach is to plot the data you have, or plot data from studies and use that to inform your modelling choices.

I’d also recommend running prior and posterior predictive checks and make extensive use of plotting to get more insight into whether the model is behaving how you intend. You can iterate from there.

You might find my mediation analysis example useful?