In a geophysical application, there is a complex relationship between two variables. One variable, I’ll call it y
here, is a function of x
(plus uncertainty), but the functional relationship is not easily parameterized, and so I would like to use a lookup table to model their relationship. In the end, I want to add more variables and add data to fit, but here I am just interested in how to implement a lookup table or something equivalent.
Here is a simple NumPy example of what I would like to implement in PyMC:
import numpy as np
# values for x in lookup table
lut_x = np.linspace(0.0, 1.0, 11)
# corresponding values for y (using a simple relationship here)
lut_y = 0.1*np.sin(2.0 * np.pi * lut_x)
x = np.random.uniform()
ilut = 1
while x > lut_x[ilut]:
ilut += 1
# linear interpolation
weight = (lut_x[ilut]-x)/(lut_x[ilut]-lut_x[ilut-1])
mu = weight * lut_y[ilut-1] + (1.0-weight) * lut_y[ilut]
y = np.random.normal(mu, 0.1)
Now, I have tried something similar in PyMC, but it either does not run or does not converge properly.
For example (leaving out the linear interpolation):
import numpy as np
import pymc as pm
import arviz
import aesara.tensor as at
coords = {'lut': np.linspace(0.0, 1.0, 11)}
model = pm.Model(coords=coords)
with model:
# values for x in lookup table
lut_x = pm.Data('lut_x', coords['lut'], dims=('lut',), mutable=False)
# corresponding values for y (using a simple relationship here)
lut_y = pm.Data('lut_y', 0.1*np.sin(2.0 * np.pi * coords['lut']), dims=('lut',), mutable=False)
x = pm.Uniform('x', lower=0.0, upper=1.0)
ilut = 1
while at.gt(x, lut_x[ilut]):
ilut += 1
y = pm.Normal('y', mu=lut_y[ilut], sigma=0.01)
idata = pm.sample(1000)
print(arviz.summary(idata))
does not want to start or gets stuck early. A solution using
ilut = pm.Deterministic('ilut', at.argmin(at.abs(lut_x - x)))
is a bit slow, does not converge, or doesn’t sample the full space.
I realize that a lookup table is probably not ideal for the sampler, but it works well in Stan, which I am trying to move away from. So probably, there’s a technique that I am not yet aware of – I looked at
but that’s not quite what I’d like to achieve.
For the record (and those interested), here is a pystan (v2) version of this simple model which converges and produces the desired output:
import pystan
stan_code = '''
data {
int nlut;
real lut_x[nlut];
real lut_y[nlut];
}
parameters {
real<lower=0.0, upper=1.0> x;
real y;
}
model {
x ~ uniform(0.0, 1.0);
{
int ilut;
real weight;
real mu;
ilut = 2;
while(x > lut_x[ilut]){
ilut += 1;
}
weight = (lut_x[ilut]-x)/(lut_x[ilut]-lut_x[ilut-1]);
mu = weight * lut_y[ilut-1] + (1.0-weight) * lut_y[ilut];
y ~ normal(mu, 0.1);
}
}
'''
# values for x in lookup table
lut_x = np.linspace(0.0, 1.0, 11)
# corresponding values for y (using a simple relationship here)
lut_y = 0.1*np.sin(2.0 * np.pi * lut_x)
stan_data = {
'nlut': len(lut_x),
'lut_x': lut_x,
'lut_y': lut_y,
}
model = pystan.StanModel(model_code=stan_code)
fit = model.sampling(data=stan_data, iter=4000, chains=4)
results = fit.extract()
print(fit)