Error with custom distribution after using scan

Juan_Martinez · December 16, 2024, 10:28am

Hi all,

While building a time series model I’m encountering the following error: ValueError: traffic_innov is not in list. When I look at the list in question, I do find a variable named traffic_innov, so my best hypothesis at this point is that a copy is being made somewhere and then equality comparison fails despite the objects being identical from the modelling perspective.

Any help would be very much appreciated

Quick context

It’s a model of arrival times which depend on traffic. Traffic has a persistent behaviour (modelled with an AR) and is also affected by weather. Basically there are three parameters, a scan loop and a likelihood (times have been floored, hence the use of a shifted geometric instead of exponential).

I’ve tried to reduce the model’s complexity as much as possible, so the code below is not entirely representative of what I’m trying to do but it serves to reproduce the error I’m stuck with.

MRE

import numpy as np
import pandas as pd
import pymc as pm
from pymc.pytensorf import collect_default_updates
import pytensor


# Create fake data
lags = 1
n_stops = 5
steps = n_stops - lags
n_trips = 2
stop_sequence = np.arange(n_stops, dtype=int)
trips = np.arange(n_trips, dtype=int)
delay_delta = np.array(
    [
        [1, 1, 1, 2, 4],
        [2, 2, 1, 1, 1],
    ]
)
weather_states = ["sunny", "rainy", "storm"]
weather = np.array(
    [
        [0, 0, 0, 1, 2],
        [1, 1, 1, 0, 0],
    ]
)

delay_df = pd.DataFrame(delay_delta, columns=stop_sequence)
weather_df = pd.DataFrame(weather, columns=stop_sequence)
data = pd.concat([delay_df, weather_df], axis=1, keys=["delay", "weather"])

coords = {
    "weather_states": weather_states,
    "lags": range(-lags, 0),
    "stops": stop_sequence,
    "steps": stop_sequence[:-lags],
    "instances": trips,
}


def ar_step(weather, traffic_t1, rho, mu, sigma):
    innov = pm.LogNormal.dist(mu=mu[weather[0]], sigma=sigma[weather[0]])
    traffic = traffic_t1 * rho[0] + innov * (1 - rho[0])
    return traffic, collect_default_updates([traffic])


def ar_dist(ar_init, weather, rho, mu, sigma, n_steps, size):
    ar_innov, _ = pytensor.scan(
        fn=ar_step,
        sequences=[
            {"input": weather, "taps": 0},
        ],
        outputs_info=[{"initial": ar_init, "taps": [-lags]}],
        # outputs_info=[{"initial": ar_init, "taps": range(-lags, 0)}],
        non_sequences=[rho, mu, sigma],
        n_steps=n_steps,
        strict=True,
    )
    return ar_innov


def geom_shift(scale, size):
    prob = 1 - pm.math.exp(-1 / scale)
    geom = pm.Geometric.dist(p=prob, size=size)
    return geom - 1


with pm.Model(coords=coords) as ar_model:
    # Data
    delay_follow = pm.Data(
        "delay_follow_obs",
        data["delay"].values[:, 1:],
        dims=("instances", "steps"),
    )
    weather_first = pm.Data(
        "weather_first_obs", data["weather"].values[:, 1], dims=("instances",)
    )
    weather_follow = pm.Data(
        f"weather_follow_obs",
        data["weather"].values[:, 1:],
        dims=(
            "instances",
            "steps",
        ),
    )
    # Parameters
    rho = pm.Beta("rho", 5, 1, dims=("lags",))  # traffic persistence
    traffic_weather_mu = pm.Normal(
        "traffic_weather_mu", 0, 1, dims=("weather_states",)
    )
    traffic_weather_sigma = pm.HalfNormal(
        "traffic_weather_sigma", 1, dims=("weather_states",)
    )
    # AR Model
    traffic_init = [
        pm.LogNormal(
            f"init{idx}",
            traffic_weather_mu[weather],
            traffic_weather_sigma[weather],
        )
        for idx, weather in enumerate(weather_first)
    ]
    traffic_innov = pm.CustomDist(
        f"traffic_innov",
        traffic_init,
        weather_follow.reshape((steps, -1)),
        rho,
        traffic_weather_mu,
        traffic_weather_sigma,
        steps,
        dist=ar_dist,
        dims=(
            f"steps",
            f"instances",
        ),  # scan iterates over leftmost dimension
    )
    # Likelihood
    pm.CustomDist(
        "delay",
        traffic_innov.T,
        dist=geom_shift,
        observed=delay_follow,
        dims=("instances", "steps"),
    )

ar_model.point_logps()

with ar_model:
    idata = pm.sample()

Output:

...

File ~\miniconda3\envs\bus\lib\site-packages\pymc\logprob\basic.py:516, in <listcomp>(.0)
    513 node_rvs = [valued_var.inputs[0] for valued_var in valued_nodes]
    514 node_values = [valued_var.inputs[1] for valued_var in valued_nodes]
    515 node_output_idxs = [
--> 516     fgraph.outputs.index(valued_var.outputs[0]) for valued_var in valued_nodes
    517 ]
    519 # Replace `RandomVariable`s in the inputs with value variables.
    520 # Also, store the results in the `replacements` map for the nodes that follow.
    521 for node_rv, node_value in zip(node_rvs, node_values):

ValueError: traffic_innov is not in list

Additional comments

But when I check fgraph.outputs it does seem to have traffic_innov:

(Pdb++) fgraph.outputs
[ValuedRV.0, traffic_weather_mu, ValuedRV.0, ValuedRV.0, ValuedRV.0, traffic_innov, delay]

If I replace the shifted geometric (custom distribution) by the standard geometric it works well (sampling is full of divergences but it’s not like I’m feeding it sensible data), so that seems to be playing a key role.

    pm.Geometric(
        "delay",
        p=1 - pm.math.exp(-traffic_innov.T),
        observed=delay_follow,
        dims=("instances", "steps"),
    )

Still I have used custom distributions without trouble and it seems to me there’s something in the interaction with scan that’s leading to the error. I’ve tried a similar model in which traffic is not affected by weather (and hence there’s only one traffic_init variable) and there everything works fine.

Versions

pymc: 5.17 and 5.19.1
pytensor: 2.25.5 and 2.26.4

Juan_Martinez · January 30, 2025, 9:50am

Update

After fixing a simpler version, let’s call it v1 (Scan related error using nutpie for an autoregressive model - #3 by ricardoV94), I’ve come back to this one, let’s call it v2. v1 does work with custom distributions, and I think the key difference is that while v1 has a single initial distribution:

    traffic_init = pm.LogNormal("traffic_init", traffic_mu, traffic_sigma, dims=("instances",))

v2 has a list of possible elements and one is selected depending on the weather at that time:

    traffic_init = [
        pm.LogNormal(
            f"traffic_init_{idx}",
            traffic_weather_mu[weather],
            traffic_weather_sigma[weather],
        )
        for idx, weather in enumerate(weather_first)
    ]

So I think the issue must be here but I can’t find it. I’ve tried wrapping the traffic_init list in a tensor variable using pytensor.tensor.stack() but it does not help. I can get draws just fine:

pm.draw(ar_model["delay"])

it’s when I call ar_model.point_logps() (or pm.sample) that fails.

Versions

pymc==5.20.0
pytensor==2.26.4

ricardoV94 · January 30, 2025, 10:32am

What error do you get?

Juan_Martinez · January 30, 2025, 11:03am

The error is the following (I’ve omitted the

----> 1 ar_model.point_logps()  # for custom dists, to check logp has been derived

File ~\miniconda3\envs\pymc-feature\Lib\site-packages\pymc\model\core.py:1798, in Model.point_logps(sel
f, point, round_vals, **kwargs)
   1795     point = self.initial_point()
   1797 factors = self.basic_RVs + self.potentials
-> 1798 factor_logps_fn = [pt.sum(factor) for factor in self.logp(factors, sum=False)]
   1799 return {
   1800     factor.name: np.round(np.asarray(factor_logp), round_vals)
   1801     for factor, factor_logp in zip(
   (...)
   1804     )
   1805 }

File ~\miniconda3\envs\pymc-feature\Lib\site-packages\pymc\model\core.py:696, in Model.logp(self, vars,
 jacobian, sum)
    694 rv_logps: list[TensorVariable] = []
    695 if rvs:
--> 696     rv_logps = transformed_conditional_logp(
    697         rvs=rvs,
    698         rvs_to_values=self.rvs_to_values,
    699         rvs_to_transforms=self.rvs_to_transforms,
    700         jacobian=jacobian,
    701     )
    702     assert isinstance(rv_logps, list)
    704 # Replace random variables by their value variables in potential terms

File ~\miniconda3\envs\pymc-feature\Lib\site-packages\pymc\logprob\basic.py:595, in transformed_conditi
onal_logp(rvs, rvs_to_values, rvs_to_transforms, jacobian, **kwargs)
    592     transform_rewrite = TransformValuesRewrite(values_to_transforms)  # type: ignore[arg-type]
    594 kwargs.setdefault("warn_rvs", False)
--> 595 temp_logp_terms = conditional_logp(
    596     rvs_to_values,
    597     extra_rewrites=transform_rewrite,
    598     use_jacobian=jacobian,
    599     **kwargs,
    600 )
    602 # The function returns the logp for every single value term we provided to it.
    603 # This includes the extra values we plugged in above, so we filter those we
    604 # actually wanted in the same order they were given in.
    605 logp_terms = {}

File ~\miniconda3\envs\pymc-feature\Lib\site-packages\pymc\logprob\basic.py:514, in conditional_logp(rv
_values, warn_rvs, ir_rewriter, extra_rewrites, **kwargs)
    511 node_rvs = [valued_var.inputs[0] for valued_var in valued_nodes]
    512 node_values = [valued_var.inputs[1] for valued_var in valued_nodes]
    513 node_output_idxs = [
--> 514     fgraph.outputs.index(valued_var.outputs[0]) for valued_var in valued_nodes
    515 ]
    517 # Replace `RandomVariable`s in the inputs with value variables.
    518 # Also, store the results in the `replacements` map for the nodes that follow.
    519 for node_rv, node_value in zip(node_rvs, node_values):

ValueError: traffic_t is not in list

(I’ve renamed traffic_innov in the original post to traffic_t).

There’s a variable named traffic_t in fgraph.outputs so I figured it may be that is not able to find it because a copy has been made of the original object but this is just a supposition.

ricardoV94 · January 30, 2025, 1:59pm

Gonna have to play with it, first glance looks like a bug!

Juan_Martinez · January 31, 2025, 8:14am

Thanks! Happy to help with anything that I can.

Juan_Martinez · February 20, 2025, 7:49am

Hey, I just wanted to follow up on this. I know that you have a lot to do and I’m okay with this being on hold for an indefinite amount of time. I just wanted to know what’s on your mind (and “nothing” is a perfectly valid answer) to adjust my expectations. This is not a blocking issue for me, I can use alternative modelling approaches to solve the same underlying problem.

Regards and I hope everything is going well,

ricardoV94 · February 24, 2025, 9:03am

Surprisingly the bug is related to the second CustomDist, not the one from Scan

ricardoV94 · February 24, 2025, 9:52am

While investigating just wanted to say you probably don’t want to do weather_follow.reshape((n_steps, -1)) but weather_follow.T, they are not the same

ricardoV94 · February 24, 2025, 11:44am

@Juan_Martinez bug will hopefully be fixed by Fix bug with chained CustomSymbolicDists by ricardoV94 · Pull Request #7690 · pymc-devs/pymc · GitHub

Thanks for reporting and bugging me to check it out

Juan_Martinez · February 24, 2025, 3:28pm

True! I don’t know why I was just thinking in terms of getting the right shape and missed that completely. Thanks a lot! It’s such a basic operation I don’t think I would’ve paid much attention when debugging, so it would have taken me a lot of time to find that.

Thank you very much !!! I’ll be looking forward to the next release!

ricardoV94 · February 24, 2025, 5:25pm

I recommend only using reshape for combining/splitting dimensions. For everything else transpose, squeeze, and expand_dims are more readable and less error prone.

There’s also einops these days with PyTensor: GitHub - arogozhnikov/einops: Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others) so you can do it with a dims-like interface

Topic		Replies	Views
Scan related error using nutpie for an autoregressive model Questions bug , time_series , scan_ops , nutpie	4	63	January 29, 2025
Implementing Chinese Restaurant Process - pytensor error on CustomDist v5	3	345	February 5, 2024
Mu of pm.Normal() depends on output of RandomVariable on previous time step. How do I implement this? v5 time_series , modeling	8	731	July 26, 2023
Declare an internal Poisson distribution inside a custom distribution failes Questions	21	2001	April 30, 2018
Hitting a weird error to do with RNGs in Scan in a custom function inside a Potential v5 pytensor	19	808	October 23, 2023

Error with custom distribution after using scan

Quick context

MRE

Additional comments

Versions

Update

Versions

Related topics