I am trying to get access to the LOO and WAIC metrics for model selection as in the documentation. In my problem, the sampling consumes about 500 MB during the sampling process which seems quite reasonable to me.
However, once i call pymc.compute_log_likelihood(trace), the RAM usage jumps to over 21 GB! This is not scalable for me since I want to run this on a cluster.
Is there a less RAM-expensive way to compute this, or some other way to get access to LOO and WAIC without having such RAM consumption?
Does anyone have any pointers on this? Is this function expected to eat up so much RAM?
Is there another way of going about this?
Any suggestions @OriolAbril ?
Log_likelihood can be pretty big. The memory footprint is chains * draws * datasize * data dtype.
Some discussion here: compute_log_likelihood for large datasets · Issue #6864 · pymc-devs/pymc · GitHub
There is an open PR in ArviZ to compute loo/waic using Dask (so it will work for arrays that don’t fit into memory): start working on dask compatible loo by OriolAbril · Pull Request #2205 · arviz-devs/arviz · GitHub if anyone can test it it would be great (also mentioned in the discussion along with potential workarounds but still needs testing!)