Memory allocation limit for NUTS with custom `logp` function (but not with VI methods)

My main question is in regards to how memory is allocated when performing NUTS sampling on a user-defined logp() model. This model (which I admit is quite onerous to compute) can be successfully fit using the Variation Inference methods but it chokes when being sampled with NUTS. The model is a mixture of several different distributions one of which is user-defined and requires numerical integration for normalization (the other three are built in pymc distributions: ExGaussian(), Normal(), and Uniform()).

The data being fit are an array of large integer values (hence the int64 data type in the error below).

See the following error when sampling with NUTS:

with model:
    res = pm.sample()

Produces the following memory error:

MemoryError                               Traceback (most recent call last)
MemoryError: Unable to allocate 1.47 TiB for an array with shape (201417847152,) and data type int64

The full traceback is as follows:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mL, sL0, tI0, mT0, sT0, w, mL_a, sL_a0, tI_a0, w_a]

 0.04% [3/8000 00:00<29:53 Sampling 4 chains, 0 divergences]
/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/link/ UserWarning: <class 'numpy.core._exceptions._ArrayMemoryError'> error does not allow us to add an extra error message
/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/link/ UserWarning: <class 'numpy.core._exceptions._ArrayMemoryError'> error does not allow us to add an extra error message
RemoteTraceback                           Traceback (most recent call last)
Traceback (most recent call last):
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/", line 129, in run
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/", line 182, in _start_loop
    point, stats = self._compute_point()
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/", line 207, in _compute_point
    point, stats = self._step_method.step(self._point)
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/step_methods/", line 286, in step
    return super().step(point)
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/step_methods/", line 208, in step
    step_res = self.astep(q)
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/step_methods/hmc/", line 186, in astep
    hmc_step = self._hamiltonian_step(start,, step_size)
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/step_methods/hmc/", line 194, in _hamiltonian_step
    divergence_info, turning = tree.extend(direction)
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/step_methods/hmc/", line 295, in extend
    tree, diverging, turning = self._build_subtree(
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/step_methods/hmc/", line 373, in _build_subtree
    return self._single_step(left, epsilon)
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/step_methods/hmc/", line 333, in _single_step
    right = self.integrator.step(epsilon, left)
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/step_methods/hmc/", line 73, in step
    return self._step(epsilon, state)
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/step_methods/hmc/", line 109, in _step
    logp = self._logp_dlogp_func(q_new, grad_out=q_new_grad)
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/", line 410, in __call__
    cost, *grads = self._aesara_function(*grad_vars)
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/compile/function/", line 984, in __call__
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/link/", line 534, in raise_with_op
    raise exc_value.with_traceback(exc_trace)
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/compile/function/", line 971, in __call__
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/graph/", line 543, in rval
    r = p(n, [x[0] for x in i], o)
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/tensor/", line 2923, in perform
    out[0] = np.arange(start, stop, step, dtype=self.dtype)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 1.47 TiB for an array with shape (201417847152,) and data type int64

The above exception was the direct cause of the following exception:

MemoryError                               Traceback (most recent call last)
MemoryError: Unable to allocate 1.47 TiB for an array with shape (201417847152,) and data type int64

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Input In [7], in <cell line: 2>()
      1 import pymc as pm
      2 with model:
----> 3     res = pm.sample()

File ~/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/, in sample(draws, step, init, n_init, initvals, trace, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, jitter_max_retries, return_inferencedata, idata_kwargs, mp_ctx, **kwargs)
    607 _print_step_hierarchy(step)
    608 try:
--> 609     mtrace = _mp_sample(**sample_args, **parallel_args)
    610 except pickle.PickleError:
    611     _log.warning("Could not pickle model, sampling singlethreaded.")

File ~/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/, in _mp_sample(draws, tune, step, chains, cores, random_seed, start, progressbar, trace, model, callback, discard_tuned_samples, mp_ctx, **kwargs)
   1519 try:
   1520     with sampler:
-> 1521         for draw in sampler:
   1522             strace = traces[draw.chain]
   1523             if strace.supports_sampler_stats and draw.stats is not None:

File ~/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/, in ParallelSampler.__iter__(self)
    460     self._progress.update(self._total_draws)
    462 while self._active:
--> 463     draw = ProcessAdapter.recv_draw(self._active)
    464     proc, is_last, draw, tuning, stats, warns = draw
    465     self._total_draws += 1

File ~/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/, in ProcessAdapter.recv_draw(processes, timeout)
    351     else:
    352         error = RuntimeError("Chain %s failed." % proc.chain)
--> 353     raise error from old_error
    354 elif msg[0] == "writing_done":
    355     proc._readable = True

RuntimeError: Chain 3 failed.

However, when using ADVI, the fitting is possible (if not slow):

vi = pm.ADVI(model=model)
approx =

Output (intentionally interrupted mid fitting):

 30.88% [3088/10000 06:25<14:22 Average Loss = 1.0235e+05]
Interrupted at 3,088 [30%]: Average Loss = 1.0395e+05

And the final fit result using VI is generally what I would expect (the model fits the data and the resulting best fit parameter values make sense). However, my suspicion is the posterior distributions are actually a bit more complicated (e.g. possibly bimodal) which is why I’m trying to figure out how to get the NUTS sampler to work.

Though the full mixture is too much to report here, the following is the logp function (and the functions on which it depends) for the custom component of the full model:

import aesara.tensor as tt
import pymc as pm

# CDF/logCDF components
def _emg_cdf(x, mu, sigma, tau):
    rv = pm.ExGaussian.dist(mu=mu,sigma=sigma, nu=tau)
    lcdf = pm.logcdf(rv, x)
    return tt.exp(lcdf)

def _log_emg_cdf(x, mu, sigma, tau):
    rv = pm.ExGaussian.dist(mu=mu,sigma=sigma, nu=tau)
    lcdf = pm.logcdf(rv, x)
    return lcdf

def _norm_sf(x, mu, sigma):
    arg = (x - mu) / (sigma * tt.sqrt(2.0))
    return 0.5 * tt.erfc(arg)

def _log_norm_sf(x, mu, sigma):
    return pm.distributions.dist_math.normal_lccdf(mu, sigma, x)

# Custom log pdf
def e_logp(x, mL, sL, tI, mT, sT):
    # Compute norm factor by numeric integrating over entire distribution
    _n = 10 #number of stdevs for numerical normalization
    _min = tt.floor(tt.min([mL-_n*sL, mT-_n*sT]))
    _max = tt.ceil(tt.max([mL+_n*np.sqrt(sL**2+tI**2), mT+_n*sT]))

    _x = tt.arange(_min, _max, dtype="int64")

    _norm_array = (
        _emg_cdf(_x, mu=mL, sigma=sL, tau=tI) 
        *_norm_sf(_x, mu=mT, sigma=sT)
    _log_norm_factor = tt.log(tt.sum(_norm_array))

    # Unnormalized dist values (log(CDF*SF) = log(CDF) + log(SF))
    _log_unscaled = (
        _log_emg_cdf(x, mu=mL, sigma=sL, tau=tI)
        +_log_norm_sf(x, mu=mT, sigma=sT)

    # Normalize distribution in logscale
    log_pdf = _log_unscaled - _log_norm_factor

    return log_pdf

I then use DensityDist() to generate a component RV for the mixture:

e_pdf = pm.DensityDist.dist(mL, sL, tI, mT, sT, logp=e_logp, class_name='e_pdf')

(all parameters mLsT have normal or exponential priors with reasonable length scales on the order of 1e4)

I guess what I’m asking is what is it about this model that’s resulting it too great of memory allocation when fitting with NUTS but not with the VI methods? Any insight y’all could provide would be greatly appreciated!

An aside question: Why does the .dist() method now require a class_name string to identify it when in PyMC3 this was not required?

This looks like some kind of bug to me, either in pymc, aesara or in your model. Can you post something that I can reproduce locally?

1 Like

Because we have to create a Python class dynamically, and before we didn’t.

1 Like

@aseyboldt and @ricardoV94 thank you for the willingness to help—your insights are always appreciated.

The below code is the most condensed I could make my pipeline such that it still produces the aforementioned bug. The model consists of 4 components (two conventional distributions–ExGauss and Norm, the custom distribution stated earlier in the thread, and a uniform background). This first block generates a mock dataset (dependencies scipy and numpy) to which I apply the pymc model.

Mock Data

import numpy as np
import scipy as sp

# Mock Parameters
m0 = 0
s0 = 1000
t0 = 500
m2 = 15000
s2 = 2500
w = [0.15, 0.55, 0.15, 0.15]
N = 10000
x = np.array(range(-10000, 25000))

# Model components
def rvs_0(m, s, t, size=1000, seed=42):
    samples = sp.stats.exponnorm.rvs(t/s, m, s, size=size, random_state=seed)
    samples = np.int0(np.round(samples))
    return samples

def pdf_1(x, m0, s0, t0, m1, s1):

    cdf = sp.stats.exponnorm.cdf(x, t0/s0, m0, s0)
    sf = sp.stats.norm.sf(x, m1, s1)
    unscaled = np.nan_to_num(cdf * sf)

    xmin = int(min(m0 - 10*s0, m1 - 10*s1))
    xmax = int(max(m0 + 10*np.sqrt(s0**2 + t0**2), m1 + 10*s1))
    xfull = np.array(range(xmin, xmax))

    cdf = sp.stats.exponnorm.cdf(xfull, t0/s0, m0, s0)
    sf = sp.stats.norm.sf(xfull, m1, s1)
    norm_factor = sum(np.nan_to_num(cdf * sf))

    pdf = np.nan_to_num(unscaled/norm_factor)
    return pdf

def rvs_1(m0, s0, t0, m1, s1, size=1000, seed=42):

    # Adjust <x> so that it encompasses full normalization range (xfull)
    xmin = int(min(m0 - 10*s0, m1 - 10*s1))
    xmax = int(max(m0 + 10*np.sqrt(s0**2 + t0**2), m1 + 10*s1))
    xfull = np.array(range(xmin, xmax))

    pdf = pdf_1(xfull, m0=m0, s0=s0, t0=t0, m1=m1, s1=s1)

    # Adjust pdf to sum to 1.0 (residue from finite normalization integration)
    residue = 1.0 - sum(pdf)
    pdf[-1] = pdf[-1] + abs(residue)

    samples = np.random.choice(xfull, size=size, replace=True, p=pdf)
    return samples

def rvs_2(m, s, size=1000, seed=42):
    samples = sp.stats.norm.rvs(m, s, size=size, random_state=seed)
    samples = np.int0(np.round(samples))
    return samples

def rvs_b(x, size=10, seed=42):
    samples = np.random.choice(x, size=size, replace=True)
    return samples

# Data point distributions
w0, w1, w2, wb = w
N0 = int(w0 * N)
N1 = int(w1 * N)
N2 = int(w2 * N)
Nb = int(wb * N)

# Full dataset 
data = np.concatenate((
    rvs_0(m0, s0, t0, size=N0), 
    rvs_1(m0, s0, t0, m2, s2, size=N1), 
    rvs_2(m2, s2, size=N2), 
    rvs_b(x, size=Nb)), 

PyMC model build

The above generated data set is then used as the observed data (in full_model) in the following code block in which I create the PyMC mixture model.

import pymc as pm
import aesara.tensor as at

with pm.Model() as mod:

    # CDF/logCDF components
    def _emg_cdf(x, mu, sigma, tau):
        rv = pm.ExGaussian.dist(mu=mu,sigma=sigma, nu=tau)
        lcdf = pm.logcdf(rv, x)
        return at.exp(lcdf)

    def _log_emg_cdf(x, mu, sigma, tau):
        rv = pm.ExGaussian.dist(mu=mu,sigma=sigma, nu=tau)
        lcdf = pm.logcdf(rv, x)
        return lcdf

    def _norm_sf(x, mu, sigma):
        arg = (x - mu) / (sigma * at.sqrt(2.0))
        return 0.5 * at.erfc(arg)

    def _log_norm_sf(x, mu, sigma):
        return pm.distributions.dist_math.normal_lccdf(mu, sigma, x)

    # Custom log pdf
    def comp1_logp(x, m0, s0, t0, m2, s2):
        # Compute norm factor by numeric integrating over entire distribution
        _n = 10 #number of stdevs for numerical normalization
        _min = at.floor(at.min([m0-_n*s0, m2-_n*s2]))
        _max = at.ceil(at.max([m0+_n*np.sqrt(s0**2+t0**2), m2+_n*s2]))

        _x = at.arange(_min, _max, dtype="int64")

        _norm_array = (
            _emg_cdf(_x, mu=m0, sigma=s0, tau=t0) 
            *_norm_sf(_x, mu=m2, sigma=s2)
        _log_norm_factor = at.log(at.sum(_norm_array))

        # Unnormalized dist values (log(CDF*SF) = log(CDF) + log(SF))
        _log_unscaled = (
            _log_emg_cdf(x, mu=m0, sigma=s0, tau=t0)
            +_log_norm_sf(x, mu=m2, sigma=s2)

        # Normalize distribution in logscale
        log_pdf = _log_unscaled - _log_norm_factor

        return log_pdf

    # Define parameter priors
    m0 = pm.Normal('m0', mu=0, sigma=2000)
    s0_0 = pm.Exponential('s0_0', lam=1/5000)
    s0 = pm.Deterministic('s0', s0_0 + 10)
    t0_0 = pm.Exponential('t0_0', lam=1/500)
    t0 = pm.Deterministic('t0', t0_0 + 10)
    m2_0 = pm.Exponential('m2_0', lam=1/10000)
    m2 = pm.Deterministic('m2', m2_0 + 10000)
    s2_0 = pm.Exponential('s2_0', lam=1/5000)
    s2 = pm.Deterministic('s2', s2_0 + 10)
    w = pm.Dirichlet('w', a=np.array([1, 1, 1, 1]))
    # Define distributions for mixture model component
    comp0 = pm.ExGaussian.dist(mu=m0, sigma=s0, nu=t0)
    comp1 = pm.DensityDist.dist(m0, s0, t0, m2, s2, logp=comp1_logp, class_name='comp1')
    comp2 = pm.Normal.dist(mu=m2, sigma=s2)
    compb = pm.Uniform.dist(lower=data.min(), upper=data.max())
    components = [comp0, comp1, comp2, compb]
    # Define observed model
    full_model = pm.Mixture('full_model', w=w, comp_dists=components, observed=data)

ADVI optimization

Using VI the model seems to run without error (if not slowly):

vi = pm.ADVI(model=mod)
approx =

Output (terminated prematurely):

 2.20% [220/10000 00:29<21:33 Average Loss = 1.0545e+05]
Interrupted at 220 [2%]: Average Loss = 1.0542e+05

NUTS sampling error

with mod:
    trace = pm.sample()

Output (truncated traceback):

The above exception was the direct cause of the following exception:

MemoryError                               Traceback (most recent call last)
MemoryError: Unable to allocate 511. GiB for an array with shape (68551114498,) and data type int64


print(f"numpy: {np.__version__}")
print(f"scipy: {sp.__version__}")
print(f"pymc: {pm.__version__}")
numpy: 1.23.4
scipy: 1.9.3
pymc: 4.2.2

You should be able to run both mock data sim and model build exactly as is above to reproduce the error. I’m hoping it’s just a dumb mistake on my part, but if it is in fact a pymc bug I’d appreciate any suggestions you might have for temporary work arounds. I’m applying this model to tens of thousands of data sets and would like to scale it up to hundreds of thousands, but given the current speed of a single model fitting instance this is compute time prohibitive, even on our local compute cluster. Your sage advice is greatly appreciated. Much obliged!

1 Like

I haven’t had time to check the code yet, but the issue might come from this line. What are the ranges you’d expect for _min and _max?

You need to keep in mind that pm.sample runs mcmc and will explore all the parameter space, including the tails, not only around the mean/mode like an optimizer. From the prior it looks like the allowed range is huge. Does the error always happen with the same shape independently of tge seed?

1 Like

Hey @OriolAbril, thanks for taking a look at my issue. Yes, that was my initial suspicion as well. Since that model component (the user defined component comp1) is proportional to the product of the EMG cumulative distribution function and the Normal survival function (i.e. the _log_unscaled definition which = log(CDF*SF)) it must be numerically normalized in order to serve as a well behaved probability distribution (I can’t find a way to analytically solve that product to find a closed-form normalization factor). To numerically normalize it, I am numerically integrating it over a large range to approximate the integration over (-\infty, \infty). The _x array is that numerical integration range, the length of which will of course depend on the values of all the parameters at each step in the fitting. The _min and _max values of that integration range are also defined to be a large number of standard deviations (for a given choice of parameter values) from each end of the probability distribution, set by _n (in order to make sure the resulting probability value is of high precision and, thus, well-behaved). In this case _n = 10 standard deviations. So it seems like it would be a large numeric integration range, as you say.

However, I calculated an extreme case for this range and it doesn’t seem like it would be the culprit. Say I sampled 10\sigma out on each of the prior distributions (pretty far out into the tails) for the given values of the hyperparameters. In that case, the length of _x is still only on the order of 10e6. See below:

Nsig = 10

_n = 10
m0 = -Nsig*2000
s0 = Nsig*5000
t0 = Nsig*500
m2 = Nsig*10000 + 10000
s2 = Nsig*5000

_min = np.floor(min([m0-_n*s0, m2-_n*s2]))
_max = np.ceil(max([m0+_n*np.sqrt(s0**2+t0**2), m2+_n*s2]))
print(f"Length of _x (min:{_min}, max:{_max}): {abs(_min-_max):e}")
print(f"_x approx size in bytes: len*64/8 = {abs(_min-_max)*64/8:e}")


Length of _x (min:-520000.0, max:610000.0): 1.130000e+06
_x approx size in bytes: len*64/8 = 9.040000e+06

And 9e6 is still way smaller than the GiB scale of the memory error. This leads me to believe that the size of _x is not actually the memory bottleneck. Does that make sense? Am I missing something?

Yeah, the error always happens with the same shape of priors (indep of seed). However, the size of the array allocation leading to the memory error does vary. For example, I just ran it again and got the following:

MemoryError                               Traceback (most recent call last)
MemoryError: Unable to allocate 484. TiB for an array with shape (133056876572664,) and data type int64

This is much larger than the previously mentioned memory error. The stated memory value is random from run to run, but always in that same GiB to TiB range.

Also, along these same lines, I have decreased _n to 3 and changed dtype=int32 in order to decrease the length of _x and the memory footprint for each element to see if that solved the issue, but I’m still getting the same error.

As an aside: Is the best way of determining the length of a tensor object in aseara to use the .eval() method and then check the length? i.e.

x = at.arange(0, 100)



Or is there a more streamlined way that doesn’t require you to evaluate the actual tensor array?

I don’t know the aesara and pymc API’s/backend all that well yet, but am trying to learn.

You can do x.shape.eval() which for most cases will will not require computing the actual tensor outputs.

1 Like

I don’t know how ADVI works, but it seems to produce a much more compact graph.

You may need to dig a bit and profile the d/logp functions for your model: Profiling Aesara function — Aesara 2.8.7+37.geadc6e33e.dirty documentation

You can obtain them via model.compile_fn(model.logp(), point_fn=False), model.compile_fn(model.dlogp(), point_fn=False)) and the value and logp combined via model.compile_fn([model.logp(), model.dlogp()], point_fn=False)

1 Like

@ricardoV94 thanks for the guidance. I read through the tutorial at the link you provided, however I believe I’m missing some basics on how to perform the profiling. I think I understand the profiling example at the bottom of the page (here: aesara/ at main · aesara-devs/aesara · GitHub) but I don’t fully understand how that relates to the compiled model functions you mention.

Is the idea to use d/logp functions obtained using model.compile_fn() as the aesara function in that linked example, and then provide the function a bunch of random inputs across the input parameter space so the profiler can get estimates on where in the model time and memory are being used?

For example I can create the compiled logp function as follows:

comp_logp = mod.compile_fn(mod.logp(), point_fn=False)


<aesara.compile.function.types.Function at 0x2abdabf3c790>

Which appears to have the following inputs:


This aesara function I should be able to evaluate by providing it inputs for all model parameters, like so:

    w_simplex__=np.array([0.25, 0.25, 0.25])



So, in order to profile comp_logp I should just evaluate it some 10k times with random input values and with:

aesara.config.profile = True
aesara.config.profile_memory = True

in order to produce a profile report. Does that all seem correct? I’d just like to make sure I’m understanding your suggestion.

Assuming I’ve got the above right, I have a few other questions regarding the inputs to the compiled logp function:

  • Are all parameters ending in _log__ simply representing the log of the underlying parameter, i.e. is s0_0_log__ equal to np.log(s0_0)?
  • Why is m0 the only non-logged parameter in the inputs?
  • What is the form of w_simplex__? Is it the weights of the first three components of the model and the last weight is inferred as the remained to sum to 1?

I appreciate any feedback, as I’ve still got a lot to learn about the inner workings of aesara. Thanks!

Yes your intuition is mostly correct, except you don’t necessarily need to test the function with random inputs (although sometimes the bottlenecks can be coordinate specific)

You can get a useable point via model.initial_point(). Evaluating the function a couple hundred times on this point should give you a stable profile.

The underscore names are for transformed parameters. Positive distributions will by default have a log transform and the variable name will be f"{name}_log__", other distributions have different default transforms (e.g, simplex for Dirichlet), and some will have none like the Normal, as any value has a nonzero density.

If you want to generate multiple points you can set initval="prior" for each variable and everytime you call model.initial_point you’ll get a new point. But again, a single initial point is usually a good enough for benchmarking.

1 Like

I still seem to be doing something incorrectly. Based on your last message, I ran the following code along with the full model build described above:

ae.config.profile = True
ae.config.profile_memory = True

for _ in range(100):

However, it returns nothing to stdout (no error either). What am I missing? Thanks!

@ricardoV94 also:

I assume I have to include this kwarg into the call to pm.Mixture? I did this by changing the last line of the model to:

full_model = pm.Mixture('full_model', w=w, comp_dists=components, observed=data, initval="prior")

However, when I call model.initial_point() multiple times, it returns the same dictionary:

{'m0': array(0.),
 's0_0_log__': array(8.51719319),
 't0_0_log__': array(6.2146081),
 'm2_0_log__': array(9.21034037),
 's2_0_log__': array(8.51719319),
 'w_simplex__': array([0., 0., 0.])}

Am I including the initval="prior" kwarg in the wrong place?

Ah, it seems I can’t even get the basic profiling example (at the bottom of the page: Profiling Aesara function — Aesara 2.8.7+37.geadc6e33e.dirty documentation) to run correctly.

The file containing the following:

import numpy as np

import aesara

x, y, z = aesara.tensor.vectors("xyz")
f = aesara.function([x, y, z], [(x + y + z) * 2])
xv = np.random.random((10,)).astype(aesara.config.floatX)
yv = np.random.random((10,)).astype(aesara.config.floatX)
zv = np.random.random((10,)).astype(aesara.config.floatX)
f(xv, yv, zv)

If I run it as follows (as stated in the example):

(pymc_env) $ AESARA_FLAGS=optimizer_excluding=fusion:inplace,profile=True python

I get the following error:

Exception ignored in atexit callback: <function _atexit_print_fn at 0x7f4d4f7ed120>
Traceback (most recent call last):
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/compile/", line 78, in _atexit_print_fn
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/compile/", line 1452, in summary
  File "/Users/jast/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/compile/", line 786, in summary_function
    print("Function profiling", file=file)
AttributeError: 'str' object has no attribute 'write'

Any thoughts?

Yes. The observed variable is the only one that is not part of the initial point :). You should set initval="prior" for every other unobserved variable in the model if you want their value to change everytime you call model.initial_point()

You don’t want to profile the initial point function, that’s just to get values you can use to profile the d/logp functions:

import pymc as pm

with pm.Model() as m:
  x = pm.Normal("x", initval="prior")
  y = pm.Normal("y", x, observed=[0])

f = m.compile_fn(m.logp(), profile=True)
for i in range(1000):
  ip = m.initial_point()

f.f accesses the actual aesara function

By the way, if after you figure out how to profile PyMC models you want to contribute that knowledge to our docs, that would be super valuable :slight_smile:

Also I forgot, there is a model.profile that you can use that does the calling with the initial point (just one) for you automatically :man_facepalming: :


We do have a notebook on profiling but it is still using 3.9: Profiling — PyMC example gallery

1 Like