Weird error using az.waic

When i’m trying to calculate the waic of my trace file, i get the weird error below. Visually inspecting the trace file doesn’t give away any potential error. The likelihood matrix looks reasonable. What could be the cause of this? Would I need to refit the model?

~/opt/anaconda3/envs/modeling/lib/python3.9/site-packages/arviz/stats/stats.py in waic(data, pointwise, var_name, scale, dask_kwargs)
   1668         warn_mg = True
   1669 
-> 1670     waic_i = scale_value * (lppd_i - vars_lpd)
   1671     waic_se = (n_data_points * np.var(waic_i.values)) ** 0.5
   1672     waic_sum = np.sum(waic_i.values)

~/opt/anaconda3/envs/modeling/lib/python3.9/site-packages/xarray/core/_typed_ops.py in __sub__(self, other)
    207 
    208     def __sub__(self, other):
--> 209         return self._binary_op(other, operator.sub)
    210 
    211     def __mul__(self, other):

~/opt/anaconda3/envs/modeling/lib/python3.9/site-packages/xarray/core/dataarray.py in _binary_op(self, other, f, reflexive)
   3523         if isinstance(other, DataArray):
   3524             align_type = OPTIONS["arithmetic_join"]
-> 3525             self, other = align(self, other, join=align_type, copy=False)  # type: ignore
...
--> 246             coord_dtype = self.coord_dtype
    247         return type(self)(index, dim, coord_dtype)
    248 

AttributeError: coord_dtype

That’s weird. It seems your trace/InferenceData has some weird type. Can you share the trace/InferenceData?

Also I recommend using loo instead of waic.

The trace file is too large (2.3G) to be able to upload. When I refit it using a subset of the data I didn’t replicate this issue. Maybe when I uploaded the file to box drive changed the file somehow and corrupted types in it? Because I remember everything was working when I first fitted my models. But after a few days until yesterday when I try to run waic again I get this wierd type error. Why is loo recommended over waic?

@OriolAbril has you seen something like this?

While theoretically both waic and loo converge, for finite samples loo has been show to provide most robust results than waic, additionally loo offers diagnostics indicating when loo may be failing, something that waic can’t provide.

Ah icic. But one of my models has loo value of NaN somehow. Since I’m fitting an RL model on pymc, maybe its complexity or the potential existence of goofy data may cause loo to fail?

Also I saved my trace using:

with open('real_traces_' + agent.name + '.p', 'wb') as f:
        pickle.dump(tr, f)

Maybe pickled .p file is not stable somehow?

You can use arviz.to_netcdf — ArviZ dev documentation to save data

From the error it looks like an issue with the dtypes of the data, or probably not even of the data but the coordinates that index the data. Can you share what you get when you run

print(tr.log_likelihood)

for the case that fails? That should include the shapes and dtypes of the different elements and even if not the whole trace to reproduce the error it might already be enough.

I am not expert on pickle, but it is always a source of headaches, I strongly recommend you use to_netcdf or to_zarr: InferenceData — ArviZ dev documentation

1 Like

OMG pickle sucks. I loaded the traces files and save them again with to_netcdf, and load the .nc files. Now it works without error! Thank you both so much for helping me out. I guess the lesson is: don’t use pickle haha

3 Likes