How to retrieve (log) posterior/prior/likelihood from trace?

In ArviZ InferenceData objects, there are several groups, each of which contains different information relevant to the inference process. A more detailed description can be found here (note it is still in progress https://github.com/arviz-devs/arviz/pull/1063). Thus, the first step is converting the trace object to an ArviZ InferenceData one. The best doc on conversion (that I know of) is the link provided above by @nkaimcaudle. I’ll focus on handling the InferenceData object once created.

log posterior

The log posterior value is stored in the sample_stats group, it can be accessed as pm_data.sample_stats.lp as a xarray DataArray (add a .values at the end to convert to numpy array).

log likelihood

The log likelihood value is not stored, but it can be calculated from the pointwise log likelihood values which are stored. The pointwise log likelihood is stored in its own group, this is done to support models with multiple observed values, thus, the observed_data and log_likelihood groups will have the same variable names (with different dimensions and values obviously) in order to map each pointwise log likelihood value to the corresponding observed value.

I would recommend avoiding xarray.Dataset.to_array() unless you are sure that either the dataset has only one variable or all its variables have exactly the same shape; otherwise the data cannot be represented as a single array and the conversion behaves in unexpected ways by broadcasting and duplicating values to make shapes match.

Ideally, pointwise log likelihood values should be accessed using the correct variable name. For example:

pm_data.log_likelihood.y
pm_data.log_likelihood.obs

again, use .values to get numpy arrays instead of xarray objects.

A general way of retrieving the log likelihood with multiple variables with different shapes would be:

pm_data.log_likelihood.sum().to_array().sum() # add a .item() to get a scalar

The first sum will reduce all dimensions in the dataset, and leave us with a dataset whose variables are all scalars, we can then use to_array as all variables have the same shape and then sum the contributions of each variable together. In the case of having a single variable, this can be reduced to

pm_data.log_likelihood.<var_name>.sum() # use .item() if needed

log prior

As for log prior values, they are not stored, only their samples are when using pm.sample_prior_predictive as explained above.

7 Likes