Hi,
I have some data whose functional form I know, but only in some interval—outside the interval I do not have good control of the functional form, and neither do I a priori know what the interval is. Up until now I’ve been running a separate fit for each possible window of data, and then looking for plateaux/islands of stability, but the results of this process are hard to visualise, reason about, or derive a single final value from. I’d like to be able to define a single model that treats the fit region bounds as parameters on the same footing as the other parameters to the fit form.
I’ve put together a relatively minimal example of what I’m trying to achieve. Constructing some sample data as:
import matplotlib.pyplot as plt
import numpy as np
def weird_function(x, gate1, gate2, A, B):
return (
((gate1 < x) & (x < gate2)) * (A * (x - gate1) + B)
+ (x < gate1) * B * (1 - np.sin((gate1 - x) * 3) / 10)
+ (x > gate2) * (B + (gate2 - gate1) * A + A / 5 * np.sin((x - gate2) * 10) - (x - gate2) ** 2 / 4)
)
A = 2
B = 3
gate1 = 3
gate2 = 7
xdata = np.linspace(0, 10, 50)
ydata = weird_function(xdata, gate1, gate2, A, B)
yerror = np.random.random(len(xdata)) / 3 + 0.2
plt.errorbar(xdata, ydata, yerr=yerror, fmt='.')
plt.show()
I’d like the model to locate and fit parameters for the straight line region, giving back values for A, B, gate1, and gate2. (The real data I want to apply this to aren’t linear; this is a simplification for the sake of readability here.)
The model I’ve tried is:
import pymc3 as pm
import theano.tensor as tt
import theano
xtensor = theano.shared(xdata)
ytensor = theano.shared(ydata)
yerrtensor = theano.shared(yerror)
xmin = min(xdata)
xmax = max(xdata)
xspan = xmax - xmin
model = pm.Model()
with model:
A = pm.Normal('A', mu=5, sigma=2)
B = pm.Normal('B', mu=5, sigma=2)
x_lower_bound = xmin + pm.Gamma('x_lower_bound', mu=xspan / 10, sigma=xspan / 2)
x_upper_bound = xmax - pm.Gamma('x_upper_bound', mu=xspan / 10, sigma=xspan / 2)
x_to_fit = pm.Deterministic('x_to_fit', xtensor[tt.gt(tt.lt(x_lower_bound, xtensor), x_upper_bound).nonzero()])
y_to_fit = pm.Deterministic('y_to_fit', ytensor[tt.gt(tt.lt(x_lower_bound, xtensor), x_upper_bound).nonzero()])
yerr_to_fit = pm.Deterministic('yerr_to_fit', yerrtensor[tt.gt(tt.lt(x_lower_bound, xtensor), x_upper_bound).nonzero()])
y_form = pm.Deterministic('y_form', A * x_to_fit + B)
y_obs = pm.Normal('y_obs', mu=y_form, sigma=yerr_to_fit, observed=y_to_fit)
trace = pm.sample(draws=1000, tune=1000)
This completes on the sample data, but generates warnings and doesn’t give good fit results—the fit bounds seem to try and run away.
It fails on my real data, with the error “could not broadcast input array from shape (1840) into shape (2720)”, presumably because the fit region changing affects the number of points being fitted, which is assumed constant; I suspect this isn’t seen in the example case because the bound never gets to the point that the data change shape?
I think I’m doing this wrong, but can’t work out how to do it properly. Can anyone offer any suggestions, please?
Thanks!