Hello,
I am new to PyMC3 and it would be great if someone could provide some guidance/feedback on my implementation below! I am building a package that takes a few (~5 or so, in practice) functions that are assumed to be draws of the same Gaussian process, and attempts to learn some of the hyper parameters that are physically relevant to our model. Right now, my ObservableModel
class uses a for
loop (which I know is discouraged) to create multiple marginal_likelihood
instances, but NUTS becomes quite slow with 2 dimensional inputs, which is where most of these functions will be fit.
Right now, it looks like marginal_likelihood
is technically only built for one function y
at a time. But I imagine the ābestā way to do this would be something like the following:
X = ... # A N x d array, where N is number of data points and d is dimensions
data = ... # An n x N array, where n is number of functions
noise = 1e-10
with pm.Model() as model:
# Define RVs
cov = ...
gp = pm.gp.Marginal(cov_func=cov)
gp.marginal_likelihood(
name,
X=X,
y=data, # <- Not the right shape according to documentation, but still "works"
noise=noise,
shape=data.shape
)
which is how MvNormal
works, rather than for func in data: ...
. This actually does sort of seem to work (and is much faster than the for loop), but I have to cheat a little to get other gp methods, such as conditional
, to work right:
with model:
for func in data:
gp.conditional(new_name, X_new, given={'y': func, 'X': X, 'noise': noise})
Hereās an example of the output using some toy data using conditional
where the dots are observations:
The \bar c variable is just the marginal standard deviation of the gp and the \ell is the length scale.
(1) So my first question is this: is this the correct way to make multiple observations of the same GP without breaking things behind the scenes? If so, can/should the gp methods such as conditional
and plot_gp_dist
be updated to handle this? I would really like to keep the perks of gps rather than using MvNormal
.
Using the above would all be well and good if it werenāt for another aspect of our model, which is a special type of uncertainty in the observations themselves. There is an unknown scaling factor associated with the observations themselves that affect each function differently. This is what is handled by the ExpansionParameterModel
class in the package. To describe it I will go into the details of the model a bit.
We are measuring what is generically called an āobservableā Z (of which there can be many, and are either functions of one or two variables), whose value can be calculated to some order k
in an expansion. The expansion looks like
where the
Z_0
and āZ_i
are known, fixed quantities, and the c_n
are found by calculating Z
up to order n
and inverting the equation. In this case, the functions that are all drawn from the same Gaussian process are the c_n
, but they only have such a property if Q
is chosen correctly. The variable Q = p/\Lambda_b
is called the expansion parameter, and is the ratio of two scales p
and \Lambda_b
. It is some number between 0 and 1 that is identical across observables Z
. We have a pretty good (but not perfect!) idea of what it is, and. If instead of the ācorrectā Q
, we chose the value Q/a
, then each of the c_n
would be scaled by a^n
, making them have different variances and thus be drawn from different, but related GPs.
Right now I do not know how to implement this scaling factor a^n
that affects each c_n
differently without using a for
loop. Currently I loop through both creating a gp
with a covariance that is properly scaled, then use marginal_likelihood
, which seems to work okay but, again, is quite slow. In a perfect world, I would use the exact same implementation as described in the above ābestā case, but where the observed y
values are themselves uncertain: c_n * a^n
for some RV a
.
Below is an example of the predictions with that uncertain scaling parameter learned as described above (\Lambda_b is the uncertain scale):
Again, the dots are the ātrueā values, and the GP curves (which are no longer forced to go through those points because of the unknown scale) tried to guess a scale that make all the curves look like draws from a single GP.
(2) Does anyone know a better way to implement this that would ease the sampling issues in NUTS? I am not sure if scan
in theano is appropriate or some completely different parameterization of my model. Maybe building a custom likelihood would work, but I would still like to harness the power of GPs. Sorry if I did not describe this as well as I could have, I am new to PyMC3.
Thanks!