This is something that we would like to do and we have discussed integrating the log likelihood computation and storage with Dask, but isn’t available yet. For now, the best work around in my opinion is computing the log likelihood manually and adding it to the inferencedata. There is one example in https://python.arviz.org/en/latest/user_guide/pymc3_refitting_xr_lik.html, but now with xarray-einstats it is no longer necessary to use apply_ufunc manually.
Assuming you had a linear regression model with student-t distribution as likelihood, it would look similar to:
from xarray_einstats.stats import XrContinuousRV
from scipy import stats
post = idata.posterior
const = idata.constant_data
mu = post["intercept"] + post["beta"] * const["x"]
df = 2.7
dist = XrContinuousRV(stats.t, 2.7, mu, post["sigma"])
log_lik = dist.logpdf(idata.observed_data["y"], dask="allowed"/"parallelized") # this will be a dataarray
# but inferencedata groups are datasets, not dataarrays
# I am positive parallelized mode will work, but allowed might too, depends
# on scipy internals, if it works it will be more efficient
idata.add_groups(log_likelihood = log_lik.to_dataset(name="y"))
I am not very familiar with numpyro, so I don’t know about this.
Exactly, if you set log_likelihood to false you won’t be able to use waic nor loo unless you compute it manually (as shown above for example, how it is computed doesn’t matter, what matters is the data is there).
Extra note: We are also planning to work on this, but I am not sure az.loo or az.waic will already work with log likelihood arrays that don’t fit in memory. If this is something you are interested in and can help out it’d be very welcome.