Hi all,
I’m currently investigating how bayesian statistics and pymc can be used in an analysis of extreme weather events. In broad strokes the current analysis involves fitting a distribution to some pooled data (m x n)
using a bootstrap and performing a linear regression between each member series in the data and a predictor (n)
, i.e. m
linear regressions against the predictor.
From the examples in the documentation (Thanks for these!) I have figured out how to fit a distribution and perform a linear regression in pymc, so far so good. In the case of the linear regression this means I have priors for the slope and intercept with shape=m
, so each regression have individual distributions for slope and intercept.
In a later stage of the analysis I want to sample all the slopes from the regressions. As I understand it a mixture with equal weights could be used for this, but I’m not sure if this is the right way to go? Testing this leads to a lot of divergences. My code for the linear regression looks like this
# data is an 50 x 100 array
# predictor is an 50 x 1 array broadcasted to 50 x 100
with pm.Model():
# Coeffs
a = pm.Normal("slopes", mu=0, sigma=10, shape=100)
b = pm.Normal("intercepts", mu=0, sigma=10, shape=100)
err = pm.HalfCauchy("sigma", beta=10, initval=1.0)
# Predictor
x_ = pm.ConstantData("predict", predictor)
# The linear regression
func = a * x_ + b
# Regression
obs = pm.Normal(
"obs",
mu=func,
sigma=err,
observed=data,
)
# Running without this part works with no divergences
w = pm.Dirichlet('w', a=np.ones(100))
pooled_slopes = pm.NormalMixture("pooled slopes", w=w, mu=a, sigma=20)
# Sample the model
pm.sample()
Very new to the world of bayesian statistics so grateful for any input!