Help with scan in logp calculation

rl100y · November 19, 2025, 10:12pm

Hi all,

I am having some difficulty understanding loops using scan. I am building a logp for an object X = (X_1, …, X_d) with a CustomDist that takes as a parameter an array called samples of shape (N, d). These samples are drawn from some joint probability on (X_i), and I am writing this CustomDist to approximate that joint probability.

As part of the logp calculation, there is a step where each X_i has a contribution f_i(X_i), where f_i is the marginal pdf on the i-th component, which I estimate using a histogram of samples[:, i-1] called marginals_hists with shape (d, 2, num_bins). To do calculate f_i(X_i), I have been trying to use scan since the function f_i is different for each i. I defined a function _get_marginal_cdf which performs the calculation of f_i(X_i) for a single dimension, and then loop over dimensions using scan, as below.

def _marginal_cdf_forward(value, marginals_hists):
    output, updates = pytensor.scan(fn=_get_marginal_cdf,
                                    sequences=[value, marginals_hists])
    transformed_value = pytensor.function(inputs=[value.T, marginals_hists],
                                          outputs=output.T,
                                          updates=updates)
    return transformed_value

However, when I try to call logp during sampling, I get the following error (including the last part of the error trace starting from the call to logp).

Cell In[42], line 68, in MvExtendedInterpolatedWithEmpiricalBetaCopula.logp(value, samples, ranks, marginals_hists)
     67 def logp(value, samples, ranks, marginals_hists):
---> 68     transformed_value = _marginal_cdf_forward(value, marginals_hists.data)
     69     ebc_density = empirical_beta_copula_density(transformed_value, samples.data)
     70     marginals_densities = _marginal_logpdfs(value, marginals_hists.data)

Cell In[41], line 248, in _marginal_cdf_forward(value, marginals_hists)
    247 def _marginal_cdf_forward(value, marginals_hists):
--> 248     output, updates = pytensor.scan(fn=_get_marginal_cdf,
    249                                     sequences=[value, marginals_hists])
    250     transformed_value = pytensor.function(inputs=[value.T, marginals_hists],
    251                                           outputs=output.T,
    252                                           updates=updates)
    253     return transformed_value

File ~/.conda/envs/pymc_env/lib/python3.12/site-packages/pytensor/scan/basic.py:681, in scan(fn, sequences, outputs_info, non_sequences, n_steps, truncate_gradient, go_backwards, mode, name, profile, allow_gc, strict, return_list)
    678 else:
    679     actual_n_steps = pt.as_tensor(n_steps)
--> 681 scan_seqs = [seq[:actual_n_steps] for seq in scan_seqs]
    682 # Conventions :
    683 #   mit_mot = multiple input taps, multiple output taps ( only provided
    684 #             by the gradient function )
   (...)
    688 
    689 # MIT_MOT -- not provided by the user only by the grad function
    690 n_mit_mot = 0

File ~/.conda/envs/pymc_env/lib/python3.12/site-packages/pytensor/tensor/variable.py:37, in _tensor_py_operators.__index__(self)
     36 def __index__(self):
---> 37     raise TypeError(
     38         "TensorVariable cannot be converted to Python integer. "
     39         "Call `.astype(int)` for the symbolic equivalent."
     40     )

TypeError: TensorVariable cannot be converted to Python integer. Call `.astype(int)` for the symbolic equivalent.

Any help addressing this issue would be much appreciated!

ricardoV94 · November 19, 2025, 11:51pm

Couple of things:

What is the type of value, marginal_hists? Are they PyTensor variables?
You probably don’t want to create a PyTensor function internally
Even if you wanted to create one you can say inputs=[value.T, ...]. Inputs must be in the graph of outputs, and value.T is by definition a new variable.

rl100y · November 20, 2025, 2:23am

I pass alongvalue from the call to logp so it is a TensorVariable. Originally, marginal_hists is a pt.constant, and then I pass marginal_hists.data to _marginal_cdf_forward in logp (see below).
Originally, I tried to just do a for loop on the dimension of the distribution by taking samples.shape[1], but it was not happy with expressions like value[:, d]. Is there another way to loop? I include at the bottom my code for _get_marginal_cdf which is modeled off the code for pm.Interpolated.
Can I ask what you mean by the graph of outputs?

Here is my logp:

def logp(value, samples, ranks, marginals_hists):
    transformed_value = _marginal_cdf_forward(value, marginals_hists.data)
    ebc_density = empirical_beta_copula_density(transformed_value, samples.data)
    marginals_densities = _marginal_logpdfs(value, marginals_hists.data)
        
    return pt.log(ebc_density) + pt.math.sum(_marginal_logpdfs, axis=1)

def _get_marginal_cdf(value_column, marginals_hists_dim):
    interp = InterpolatedUnivariateSpline(marginals_hists_dim[0], marginals_hists_dim[1], k=1, ext="zeros")
    Z = interp.integral(marginals_hists_dim[0, ..., 0], marginals_hists_dim[0, ..., -1])
    interp_cdf = interp.antiderivative()
    
    interp_cdf_op = SplineWrapper(interp_cdf)
    Z = pt.constant(Z)
    
    return pt.log(interp_cdf_op(value_column) / Z)

rl100y · November 20, 2025, 10:24pm

I’ve been playing around with how to pass marginals_hists since there seems to be a difficulty somewhere between using a TensorConstant marginals_hists when I need to take a slice (which creates a TensorVariable) and getting to a 1d numpy array to feed into InterpolatedUnivariateSpline. I have tried two cases so far:

I get the ndarray for marginals_hists first, but run into an issue since index is a Tensor object, or
I have marginals_hists[index, 0] as a TensorVariable, but have some difficulty getting the ndarray to pass to scipy.

Do I just use eval() ? Alternatively, is there a way to keep marginals_hists as an ndarray through all of this and deal somehow with index being some Tensor object?

ricardoV94 · November 21, 2025, 8:55am

So the problem is InterpolatedUnivariateSplite requires numpy array (doesn’t work with TensorVariables)?

Alternatively, is there a way to keep marginals_hists as an ndarray through all of this and deal somehow with index being some Tensor object?

For the specific question of indexing with a tensor on a numpy array you can do pt.as_tensor(array)[tensor_index].

Regarding .eval(), that is only safe if you’re really dealing with constants, if it’s model parameters or expressions that combine model parameters and constants you shouldn’t use eval or you’ll have a wrong model in the end.

I have to look at your example more careful to give a proper holistic reply. Treat my comments as the myopic responses they are.

rl100y · December 12, 2025, 10:23pm

Thanks for the response and the thoughts!

TL;DR: Indeed, InterpolatedUnivariateSpline requires numpy arrays, and I was having trouble feeding it the appropriate arrays, which were slices of a pt.constant. I eventually used eval() since the arrays I was feeding into InterpolatedUnivariateSpline truly are constant as far as I am concerned, and the code runs and works. However, it is slow. Is there a more PyMC-native way to write the code at the bottom to run faster?

In slightly more detail: I’m trying to build a nonparametric distribution based on some samples that I have from the desired distribution. So, I start with some samples of a joint distribution of a d-dimensional random variable X = (X_1, ..., X_d). I’m trying to implement an empirical beta copula, where for each dimension of my joint distribution, I transform the variable X_i to range over the unit interval using the marginal cdf for that dimension: X_i -> F_i(X_i) where F_iis the CDF for X_imarginalized over the other d-1 dimensions.

In my code, I calculate histograms along each dimension and stack them into the array marginals_hists which is shape (d, 2, num_bins). This array contains the bin centers as marginals_hists[d, 0]and bin heights marginals_hists[d, 1]along each dimension. Then, I fit a spline against these to get the marginal distributions.

So in my case, I really consider samples and marginals_hists to be constants. In the code below, I use eval() to get slices of marginals_hists to feed into InterpolatedUnivariateSpline. This was the issue I was running into previously, where it considered marginals_hists[d, 0]to be a TensorVariable since I was looping over the dimension, even when I had initially set marginals_hists = pt.constant(marginals_hists). The code works now, though it is a little slow.

Is there a more PyMC/PyTensor-native way to write this code to run faster?

I also saw that there is interpolation in PyTensor, though I wasn’t sure if it was possible to do stuff like antiderivative() and integral(., .) with it.

def logp(value, samples, ranks, marginals_hists):
    transformed_value, marginals_densities_summed = _perform_marginal_calculations(value, marginals_hists)
    ebc_density = empirical_beta_copula_density(transformed_value, samples.data)
    return pt.log(ebc_density) + marginals_densities_summed

def _perform_marginal_calculations(value, marginals_hists):
    dim = marginals_hists.shape[0].eval()
    transformed_values = pt.zeros_like(value)
    marginal_logpdf_sum = 0
    for d in range(dim):
        xpoints = marginals_hists[d, 0].eval()
        pdfpoints = marginals_hists[d, 1].eval()
        interp = InterpolatedUnivariateSpline(xpoints, pdfpoints, k=1, ext="zeros")
        Z = interp.integral(xpoints[0], xpoints[-1])
        interp_cdf = interp.antiderivative()
        interp_op = SplineWrapper(interp)
        interp_cdf_op = SplineWrapper(interp_cdf)
        Z = pt.constant(Z)
        value_column = value[..., d]
        marginal_logpdf_sum += pt.log(interp_op(value_column) / Z)
        transformed_values = transformed_values[d].set(interp_cdf_op(value_column) / Z)
    return transformed_values, marginal_logpdf_sum

Topic		Replies	Views
How to use pytensor's scan function version agnostic	5	1732	February 27, 2023
Iterating over Tensors - pytensor.scan v5 pytensor	8	793	June 7, 2023
PyTensor scan of an array with PyMC sampling pytensor	7	563	March 29, 2024
Hitting a weird error to do with RNGs in Scan in a custom function inside a Potential v5 pytensor	19	1067	October 23, 2023
Constructing a matrix within each step of `pytensor.scan`	4	102	January 28, 2025

Help with scan in logp calculation

Related topics