Plotting the prior likelihood vs target in pymc

Hello,

I am trying to replicate the below piece of code (which is in pymc3) to the latest pymc version:

with model_2:
    prior_pred = pm.sample_prior_predictive()
    
fig, ax = plt.subplots(figsize = (20, 8))
_ = ax.plot(prior_pred["likelihood"].T, color = "0.5", alpha = 0.1)
_ = ax.plot(data_transformed[target].values[START_ANALYSIS_INDEX:END_ANALYSIS_INDEX], color = "red")

And below is my code :

with model_2:
    prior_pred = pm.sample_prior_predictive()

fig, ax = plt.subplots(figsize = (20, 8))
_ = ax.plot(prior_pred.prior_predictive.likelihood.values.T, color = "0.5", alpha = 0.1)
_ = ax.plot(data_transformed[target].values[START_ANALYSIS_INDEX:END_ANALYSIS_INDEX], color = "red")

This however throws the below error:

ValueError                                Traceback (most recent call last)
Cell In[57], line 5
      1 #prior_pred.prior_predictive.likelihood.values.T
      2 #data_transformed[target].values[START_ANALYSIS_INDEX:END_ANALYSIS_INDEX]
      4 fig, ax = plt.subplots(figsize = (20, 8))
----> 5 _ = ax.plot(prior_pred.prior_predictive.likelihood.values.T, color = "0.5", alpha = 0.1)
      6 _ = ax.plot(data_transformed[target].values[START_ANALYSIS_INDEX:END_ANALYSIS_INDEX], color = "red")

File C:\Py\Lib\site-packages\matplotlib\axes\_axes.py:1688, in Axes.plot(self, scalex, scaley, data, *args, **kwargs)
   1445 """
   1446 Plot y versus x as lines and/or markers.
   1447 
   (...)
   1685 (``'green'``) or hex strings (``'#008000'``).
   1686 """
   1687 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D)
-> 1688 lines = [*self._get_lines(*args, data=data, **kwargs)]
   1689 for line in lines:
   1690     self.add_line(line)

File C:\Py\Lib\site-packages\matplotlib\axes\_base.py:311, in _process_plot_var_args.__call__(self, data, *args, **kwargs)
    309     this += args[0],
    310     args = args[1:]
--> 311 yield from self._plot_args(
    312     this, kwargs, ambiguous_fmt_datakey=ambiguous_fmt_datakey)

File C:\Py\Lib\site-packages\matplotlib\axes\_base.py:507, in _process_plot_var_args._plot_args(self, tup, kwargs, return_kwargs, ambiguous_fmt_datakey)
    504     raise ValueError(f"x and y must have same first dimension, but "
    505                      f"have shapes {x.shape} and {y.shape}")
    506 if x.ndim > 2 or y.ndim > 2:
--> 507     raise ValueError(f"x and y can be no greater than 2D, but have "
    508                      f"shapes {x.shape} and {y.shape}")
    509 if x.ndim == 1:
    510     x = x[:, np.newaxis]

ValueError: x and y can be no greater than 2D, but have shapes (291,) and (291, 500, 1)

How to replicate the code in pymc3 so that the shapes error is resolved.
My current likelihood values are as below:

array([[[
[
[

[

[

[

1184.862],
1187.833],
1191.296],

1188.727],
1189.769],
1190.376]],

1190.777],
1196.505],
1194.140],

1191.161],
1194.093],
1194.907]],

1190.804],
1194.115],
1191.686],

1193.252],
1189.390],
1193.189]],

[

[

[
[

[

1188.201],
1185.301],
1187.260],

1186.038],
1185.586],
1186.955]],

1184.725],
1185.391],
1185.949],

1186.355],
1188.212],
1187.906]],

1184.055],
1185.312],
1187.122],

1185.468],
1187.763],
1188.102]]])

This array is of shape 291, 500, 1.
291 is the number of rows in the data, 500 is the sample size during pm.sample.
Can anyone please help me resolve this?

pm.sample_prior_predictive adds a dummy chain dimension (that’s the 1 at the end of the shape there). You can get rid of it with .sel(chain=0), or by squeezing the array that .values returns.

Thank you @jessegrabowski , I used the .sel(chain=0) and below is the output:

This is an xarray with 291 rows and each row containing 500 samples of the likelihood , I think.
How do I get 291 rows with a single value , something like np.average on each row might work.
But not sure how to do that on an xarray.
Can you please let me know?

It might be good to check the xarray documentation. If you just want a mean over chains and draws, then you would do idata.prior_predictive.likelihood.mean(dim=['chain', 'draw']). You don’t need to sel in this case, since both dims will be reduced by the mean.

This discards all the distributional information in the prior, though. You could consider doing KDE plots for each sample, plotted together or separately. Arviz has tools for this (az.plot_posterior works fine with priors, the name is a bit of a head fake [since posteriors are the same as priors anyway])

Thank you so much @jessegrabowski . I will try out your suggestion and let you know where I land.

The expected output is the below plot where the red line are the original values and the grey lines are from the prior distribution of the likelihood:

Thank you so much for the guidance. Cheers!!