In ArviZ InferenceData objects, there are several groups, each of which contains different information relevant to the inference process. A more detailed description can be found here (note it is still in progress https://github.com/arviz-devs/arviz/pull/1063). Thus, the first step is converting the trace
object to an ArviZ InferenceData one. The best doc on conversion (that I know of) is the link provided above by @nkaimcaudle. I’ll focus on handling the InferenceData
object once created.
log posterior
The log posterior value is stored in the sample_stats
group, it can be accessed as pm_data.sample_stats.lp
as a xarray DataArray (add a .values
at the end to convert to numpy array).
log likelihood
The log likelihood value is not stored, but it can be calculated from the pointwise log likelihood values which are stored. The pointwise log likelihood is stored in its own group, this is done to support models with multiple observed values, thus, the observed_data
and log_likelihood
groups will have the same variable names (with different dimensions and values obviously) in order to map each pointwise log likelihood value to the corresponding observed value.
I would recommend avoiding xarray.Dataset.to_array()
unless you are sure that either the dataset has only one variable or all its variables have exactly the same shape; otherwise the data cannot be represented as a single array and the conversion behaves in unexpected ways by broadcasting and duplicating values to make shapes match.
Ideally, pointwise log likelihood values should be accessed using the correct variable name. For example:
pm_data.log_likelihood.y
pm_data.log_likelihood.obs
again, use .values
to get numpy arrays instead of xarray objects.
A general way of retrieving the log likelihood with multiple variables with different shapes would be:
pm_data.log_likelihood.sum().to_array().sum() # add a .item() to get a scalar
The first sum will reduce all dimensions in the dataset, and leave us with a dataset whose variables are all scalars, we can then use to_array
as all variables have the same shape and then sum the contributions of each variable together. In the case of having a single variable, this can be reduced to
pm_data.log_likelihood.<var_name>.sum() # use .item() if needed
log prior
As for log prior values, they are not stored, only their samples are when using pm.sample_prior_predictive
as explained above.