Hey, novice pymc user here.
I’m attempting to set up pytests that call pm.sample_posterior_predictive
as part of my project’s CI/CD. These tests have to be run on a self-hosted runner for security reasons. I can run these tests locally, and they pass no problem, but when I try to run them in git actions, the process hangs when sampling starts (logs show Sampling: []
indefinitely). I’ve pinpointed the stall to the following statement in my code.
posterior_predictive_oos = pm.sample_posterior_predictive(
trace=self.idata,
var_names=_var_names,
predictions=True,
random_seed=42,
)
I’ve validated that self.idata
is populated appropriately at call time, and identically to when I run locally; same for _var_names
. I’ve validated that all package versions are identical to locally, where it passes. I’ve tried adding a keep-alive logging process, which also fails (stops printing logs at the expected cadence), which makes me think the process has failed entirely (e.g. some memory issue has crashed the system). However, before executing the above statement, the git runner has more memory available than I do locally when the test runs successfully (when the above statement is called, psutil.virtual_memory().available
outputs >100 GB).
I think this issue might be related to this discussion, but I couldn’t find anything in that discussion to resolve my issue.
OS: Linux 5.15.0-1071-azure
pymc: pymc-5.17.0
pytensor: pytensor-2.25.5
Is this a known issue with any kinds of architecture? Anything else I can do to help diagnose the issue?
Thank you for your help