Hello,

I am new to PyMC3 and it would be great if someone could provide some guidance/feedback on my implementation below! I am building a package that takes a few (~5 or so, in practice) functions that are assumed to be draws of the *same* Gaussian process, and attempts to learn some of the hyper parameters that are physically relevant to our model. Right now, my `ObservableModel`

class uses a `for`

loop (which I know is discouraged) to create multiple `marginal_likelihood`

instances, but NUTS becomes quite slow with 2 dimensional inputs, which is where most of these functions will be fit.

Right now, it looks like `marginal_likelihood`

is technically only built for one function `y`

at a time. But I imagine the ābestā way to do this would be something like the following:

```
X = ... # A N x d array, where N is number of data points and d is dimensions
data = ... # An n x N array, where n is number of functions
noise = 1e-10
with pm.Model() as model:
# Define RVs
cov = ...
gp = pm.gp.Marginal(cov_func=cov)
gp.marginal_likelihood(
name,
X=X,
y=data, # <- Not the right shape according to documentation, but still "works"
noise=noise,
shape=data.shape
)
```

which is how `MvNormal`

works, rather than `for func in data: ...`

. This actually does sort of seem to work (and is much faster than the for loop), but I have to cheat a little to get other gp methods, such as `conditional`

, to work right:

```
with model:
for func in data:
gp.conditional(new_name, X_new, given={'y': func, 'X': X, 'noise': noise})
```

Hereās an example of the output using some toy data using `conditional`

where the dots are observations:

The $\bar c$ variable is just the marginal standard deviation of the gp and the $\ell$ is the length scale.

**(1)** So my first question is this: is this the correct way to make multiple observations of the same GP without breaking things behind the scenes? If so, can/should the gp methods such as `conditional`

and `plot_gp_dist`

be updated to handle this? I would really like to keep the perks of gps rather than using `MvNormal`

.

Using the above would all be well and good if it werenāt for another aspect of our model, which is a special type of uncertainty in the observations themselves. There is an unknown scaling factor associated with the observations themselves that affect each function differently. This is what is handled by the `ExpansionParameterModel`

class in the package. To describe it I will go into the details of the model a bit.

We are measuring what is generically called an āobservableā Z (of which there can be many, and are either functions of one or two variables), whose value can be calculated to some order `k`

in an expansion. The expansion looks like

where the

`Z_0`

and `āZ_i`

are **known, fixed**quantities, and the

`c_n`

are found by calculating `Z`

up to order `n`

and inverting the equation. In this case, the functions that are all drawn from the same Gaussian process are the `c_n`

, **but they only have such a property if**. The variable

`Q`

is chosen correctly`Q = p/\Lambda_b`

is called the expansion parameter, and is the ratio of two scales `p`

and `\Lambda_b`

. It is some number between 0 and 1 that is identical across observables `Z`

. We have a pretty good (but not perfect!) idea of what it is, and. If instead of the ācorrectā `Q`

, we chose the value `Q/a`

, then each of the `c_n`

would be scaled by `a^n`

, making them have different variances and thus be drawn from different, but related GPs.
Right now I do not know how to implement this scaling factor `a^n`

that affects each `c_n`

differently without using a `for`

loop. Currently I loop through both creating a `gp`

with a covariance that is properly scaled, then use `marginal_likelihood`

, which seems to work okay but, again, is quite slow. In a perfect world, I would use the exact same implementation as described in the above ābestā case, but where the observed `y`

values are themselves uncertain: `c_n * a^n`

for some RV `a`

.

Below is an example of the predictions with that uncertain scaling parameter learned as described above ($\Lambda_b$ is the uncertain scale):

Again, the dots are the ātrueā values, and the GP curves (which are no longer forced to go through those points because of the unknown scale) tried to guess a scale that make all the curves look like draws from a single GP.

**(2)** Does anyone know a better way to implement this that would ease the sampling issues in NUTS? I am not sure if `scan`

in theano is appropriate or some completely different parameterization of my model. Maybe building a custom likelihood would work, but I would still like to harness the power of GPs. Sorry if I did not describe this as well as I could have, I am new to PyMC3.

Thanks!