Hello,
after playing around with PyMC for a while I am now trying to apply PyMC v5 to a real-world problem. However, even if I think the model is pretty simple, no matter what I do, sampling always fails (one or more variables have RSS values in the range 1.1 to 3, effective sample sizes are sometimes <<100, I see patterns in trace plots and a lot divergences using NUTS).
The problem can be stated as follows: Find the parameters of a quadratic regression function
E\left[y\right]=f\left(x\right)=\beta_{0}+\beta_{1}x+\beta_{2}x^{2}
The prior information I want to encode:
- The target variable y does not have any uncertainty, but the independent variable x has.
- x\in\left[0,1\right], y may be unconstraint.
- The vertex x_{0} of the graph must be in the interval \left[\frac{1}{2},1\right] and it must be the global maximum and there are no observations expected with x values greater than the vertex. (I.e., if x_{0} is 0.7, all possible x-values are expected within \left[0,0.7\right].)
- This is a hierarchical model: Each group of data has its own vector of coefficients, \beta=\left(\beta_{0},\beta_{1},\beta_{2}\right), as well vertex x_{0}.
The following relations can be derived from this:
- \beta_{2}<0
- \beta_{1}=-2x_{0}\beta_{2}
The current state of the model looks like this, which tries to encode the prior information via the variables x_{0} and \beta_{2}.
q = 0.5
n_obs = len(x)
n_groups = len(grps)
# initial values
beta_2_init, beta_1_init, beta_0_init = np.polyfit(x, y, 2)
x0_init = -0.5*beta_1_init/beta_2_init
x1_init = x0_init + 0.5*h1/beta_2_init
with pm.Model() as model:
# x0
theta_0_alpha = pm.Uniform("theta_0_alpha", lower=0.1, upper=100, initval=16.7631)
theta_0_beta = pm.Uniform("theta_0_beta", lower=2, upper=100, initval=5.5099)
theta_0 = pm.Beta("theta_0", alpha=theta_0_alpha, beta=theta_0_beta, shape=n_groups)
x0 = pm.Deterministic("x0", q + theta_0*(1 - q))
x0_ = x0 - x_offset
# beta_2
mu_beta_2 = pm.Uniform("mu_beta_2", lower=np.max([0, -0.5*beta_2_init]), upper=-1.5*beta_2_init)
sigma_beta_2 = pm.HalfNormal("sigma_beta_2", sigma=50)
beta_2_neg = pm.InverseGamma("beta_2_neg", mu=mu_beta_2, sigma=sigma_beta_2, shape=n_groups)
beta_2 = pm.Deterministic("beta_2", -beta_2_neg)
# x
sigma_sigma_x = 0.05
sigma_x = pm.HalfNormal("sigma_x", sigma_sigma_x)
x_lat = pm.Uniform("x_lat", lower=-x_offset*np.ones(n_obs) - sigma_sigma_x, upper=x0_[grp_idx] + sigma_sigma_x)
x_obs = pm.Normal("x_obs", mu=x_lat, sigma=sigma_x, observed=x)
# y
sigma_beta_0 = pm.HalfNormal("sigma_beta_0", 100)
beta_1 = pm.Deterministic("beta_1", -2*x0_*beta_2)
beta_0 = pm.Normal("beta_0", mu=beta_0_init, sigma=sigma_beta_0, shape=n_groups)
y_pred = pm.Deterministic("y_pred", beta_0[grp_idx] + beta_1[grp_idx] * x_lat + beta_2[grp_idx] * x_lat**2)
y_obs = pm.Normal("y_obs", mu=y_pred, sigma=5, observed=y)
Here x_offset
is the mean of the original x values of all observations and groups. I.e., I the model works on the shifted x values x - x_offset
in order to be able to set the prior mean of \beta_{0} to zero (otherwise I would have no specific prior knowledge about this parameter).
Please note that I could not find out how to encode that y is actually noise-free, so I decided to give it at least a bit of Gaussian noise. This seems to be important for sampling, because setting the variance of y_pred
to, e.g., 0.001, the results are even worse.
Sampling:
with model:
trace = pm.sample(3000, tune=6000, chains=4, cores=4, nuts_sampler_kwargs={"target_accept": 0.99})
I don’t know what else to try and would appreciate any help. Thanks.