Hi all,
While building a time series model I’m encountering the following error: ValueError: traffic_innov is not in list
. When I look at the list in question, I do find a variable named traffic_innov
, so my best hypothesis at this point is that a copy is being made somewhere and then equality comparison fails despite the objects being identical from the modelling perspective.
Any help would be very much appreciated
Quick context
It’s a model of arrival times which depend on traffic. Traffic has a persistent behaviour (modelled with an AR) and is also affected by weather. Basically there are three parameters, a scan loop and a likelihood (times have been floored, hence the use of a shifted geometric instead of exponential).
I’ve tried to reduce the model’s complexity as much as possible, so the code below is not entirely representative of what I’m trying to do but it serves to reproduce the error I’m stuck with.
MRE
import numpy as np
import pandas as pd
import pymc as pm
from pymc.pytensorf import collect_default_updates
import pytensor
# Create fake data
lags = 1
n_stops = 5
steps = n_stops - lags
n_trips = 2
stop_sequence = np.arange(n_stops, dtype=int)
trips = np.arange(n_trips, dtype=int)
delay_delta = np.array(
[
[1, 1, 1, 2, 4],
[2, 2, 1, 1, 1],
]
)
weather_states = ["sunny", "rainy", "storm"]
weather = np.array(
[
[0, 0, 0, 1, 2],
[1, 1, 1, 0, 0],
]
)
delay_df = pd.DataFrame(delay_delta, columns=stop_sequence)
weather_df = pd.DataFrame(weather, columns=stop_sequence)
data = pd.concat([delay_df, weather_df], axis=1, keys=["delay", "weather"])
coords = {
"weather_states": weather_states,
"lags": range(-lags, 0),
"stops": stop_sequence,
"steps": stop_sequence[:-lags],
"instances": trips,
}
def ar_step(weather, traffic_t1, rho, mu, sigma):
innov = pm.LogNormal.dist(mu=mu[weather[0]], sigma=sigma[weather[0]])
traffic = traffic_t1 * rho[0] + innov * (1 - rho[0])
return traffic, collect_default_updates([traffic])
def ar_dist(ar_init, weather, rho, mu, sigma, n_steps, size):
ar_innov, _ = pytensor.scan(
fn=ar_step,
sequences=[
{"input": weather, "taps": 0},
],
outputs_info=[{"initial": ar_init, "taps": [-lags]}],
# outputs_info=[{"initial": ar_init, "taps": range(-lags, 0)}],
non_sequences=[rho, mu, sigma],
n_steps=n_steps,
strict=True,
)
return ar_innov
def geom_shift(scale, size):
prob = 1 - pm.math.exp(-1 / scale)
geom = pm.Geometric.dist(p=prob, size=size)
return geom - 1
with pm.Model(coords=coords) as ar_model:
# Data
delay_follow = pm.Data(
"delay_follow_obs",
data["delay"].values[:, 1:],
dims=("instances", "steps"),
)
weather_first = pm.Data(
"weather_first_obs", data["weather"].values[:, 1], dims=("instances",)
)
weather_follow = pm.Data(
f"weather_follow_obs",
data["weather"].values[:, 1:],
dims=(
"instances",
"steps",
),
)
# Parameters
rho = pm.Beta("rho", 5, 1, dims=("lags",)) # traffic persistence
traffic_weather_mu = pm.Normal(
"traffic_weather_mu", 0, 1, dims=("weather_states",)
)
traffic_weather_sigma = pm.HalfNormal(
"traffic_weather_sigma", 1, dims=("weather_states",)
)
# AR Model
traffic_init = [
pm.LogNormal(
f"init{idx}",
traffic_weather_mu[weather],
traffic_weather_sigma[weather],
)
for idx, weather in enumerate(weather_first)
]
traffic_innov = pm.CustomDist(
f"traffic_innov",
traffic_init,
weather_follow.reshape((steps, -1)),
rho,
traffic_weather_mu,
traffic_weather_sigma,
steps,
dist=ar_dist,
dims=(
f"steps",
f"instances",
), # scan iterates over leftmost dimension
)
# Likelihood
pm.CustomDist(
"delay",
traffic_innov.T,
dist=geom_shift,
observed=delay_follow,
dims=("instances", "steps"),
)
ar_model.point_logps()
with ar_model:
idata = pm.sample()
Output:
...
File ~\miniconda3\envs\bus\lib\site-packages\pymc\logprob\basic.py:516, in <listcomp>(.0)
513 node_rvs = [valued_var.inputs[0] for valued_var in valued_nodes]
514 node_values = [valued_var.inputs[1] for valued_var in valued_nodes]
515 node_output_idxs = [
--> 516 fgraph.outputs.index(valued_var.outputs[0]) for valued_var in valued_nodes
517 ]
519 # Replace `RandomVariable`s in the inputs with value variables.
520 # Also, store the results in the `replacements` map for the nodes that follow.
521 for node_rv, node_value in zip(node_rvs, node_values):
ValueError: traffic_innov is not in list
Additional comments
But when I check fgraph.outputs
it does seem to have traffic_innov
:
(Pdb++) fgraph.outputs
[ValuedRV.0, traffic_weather_mu, ValuedRV.0, ValuedRV.0, ValuedRV.0, traffic_innov, delay]
If I replace the shifted geometric (custom distribution) by the standard geometric it works well (sampling is full of divergences but it’s not like I’m feeding it sensible data), so that seems to be playing a key role.
pm.Geometric(
"delay",
p=1 - pm.math.exp(-traffic_innov.T),
observed=delay_follow,
dims=("instances", "steps"),
)
Still I have used custom distributions without trouble and it seems to me there’s something in the interaction with scan that’s leading to the error. I’ve tried a similar model in which traffic is not affected by weather (and hence there’s only one traffic_init
variable) and there everything works fine.
Versions
- pymc: 5.17 and 5.19.1
- pytensor: 2.25.5 and 2.26.4