Compute_log_likelihood RAM usage is huge

dans · September 20, 2023, 2:45pm

I am trying to get access to the LOO and WAIC metrics for model selection as in the documentation. In my problem, the sampling consumes about 500 MB during the sampling process which seems quite reasonable to me.

However, once i call pymc.compute_log_likelihood(trace), the RAM usage jumps to over 21 GB! This is not scalable for me since I want to run this on a cluster.

Is there a less RAM-expensive way to compute this, or some other way to get access to LOO and WAIC without having such RAM consumption?

dans · September 26, 2023, 10:21pm

Does anyone have any pointers on this? Is this function expected to eat up so much RAM?
Is there another way of going about this?

cluhmann · September 27, 2023, 12:13am

Any suggestions @OriolAbril ?

ricardoV94 · September 27, 2023, 6:56am

Log_likelihood can be pretty big. The memory footprint is chains * draws * datasize * data dtype.

Some discussion here: compute_log_likelihood for large datasets · Issue #6864 · pymc-devs/pymc · GitHub

OriolAbril · September 27, 2023, 7:43am

There is an open PR in ArviZ to compute loo/waic using Dask (so it will work for arrays that don’t fit into memory): start working on dask compatible loo by OriolAbril · Pull Request #2205 · arviz-devs/arviz · GitHub if anyone can test it it would be great (also mentioned in the discussion along with potential workarounds but still needs testing!)

Topic		Replies	Views
Memory spike at the end of the MCMC sampling Questions	10	2607	July 15, 2022
Understand root cause of high memory utilization v3	8	651	December 8, 2023
Sampling draw time increases massively near "finish line" for 1M observed rows Questions	3	417	July 28, 2021
Outputting loglikelihood of each parameter set Questions	10	636	March 22, 2022
Unexpected Results when using WAIC to Compute Log Predictive Accuracy Questions	3	672	May 13, 2018

Compute_log_likelihood RAM usage is huge

Related topics