# Posterior predictive distributions of parameter values differs a lot from posterior?

I must have missed this part in the bayesian statistics course.
Consider the code below:

``````import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
import pandas as pd
import pytensor.tensor as tt

def generate_input_data(timerange):
import random
return np.array([random.uniform(0.5, 0.7) for x in range(len(timerange))]).reshape(-1, 1)

def get_coefficient_val(date, coefficientknots):
coefficientval = 0
for knot, val in coefficientknots.items():
lb, ub = map(int, knot.split("-"))
if lb <= date.day <= ub:
coefficientval = val
break
return coefficientval

def get_exponent_val(date, exponentknots):
exponentval = 1.0
for exponentknot, val in exponentknots.items():
lb, ub = map(int, exponentknot.split("-"))
if lb <= date.day <= ub:
exponentval = val
break
return exponentval

def compute_true_values(timerange, coefficientknots, exponentknots):
true_values = []
for idx, date in enumerate(timerange):
coefficientval = get_coefficient_val(date, coefficientknots)
exponentval = get_exponent_val(date, exponentknots)
if coefficientval == 0 or exponentval == 1.0:
raise ValueError("Invalid coefficient or exponent value")
val = generate_input_data([date])[0]
entry = coefficientval * val ** exponentval
true_values.append(entry)
return np.array(true_values)

def main():

timerange = pd.date_range(start="2023-01-01", end="2023-07-24", freq='D')
coefficientknots = {'0-15': 2.0, '15-32': 3.0}
exponentknots = {'0-15': 0.2, '15-32': 0.3}
data = generate_input_data(timerange)
true_values = compute_true_values(timerange, coefficientknots, exponentknots)

# Gather stuff into a dataframe for organization
df = pd.DataFrame(np.c_[data, true_values], index=timerange, columns=['data', 'targets'])
df['day'] = df.index.day

# Create the map by assiging a bin index to each datapoint based on which day it falls into
knot_cutpoints = [0, 15, 32]
df['knot_idx'] = pd.cut(df['day'], knot_cutpoints, labels=False)

# Train-test split includes the knot mapping
df_train = df.iloc[:140].copy()
df_test = df.iloc[140:].copy()

# These names will be used to make the model output look nicer
knot_names = [f'{a}-{b}' for a, b in zip(knot_cutpoints[:-1], knot_cutpoints[1:])]

# Coords let us refer to dimension by their name, rather than having to remember their shapes
coords = {'knot': knot_names}

with pm.Model(coords=coords) as model:
data_pt = pm.MutableData('input', df_train.data)
targets_pt = pm.MutableData(f'convresponse', df_train.targets)

# Here is the mapping; when we do out-of-sample prediction we'll update this as well
knot_idx_pt = pm.MutableData('knot_idx', df_train.knot_idx)

# Using coords, draw all the parameters for all knots at once
knot_coef = pm.Normal(f'knot_coef', 0, 1, dims=['knot'])
knot_exponent = pm.Gamma(f'knot_exponent', 3, 3, dims=['knot'])

# Use the mapping defined above to assign the correct coefficient to the
# correct observation
modelledresponse = tt.power(knot_coef[knot_idx_pt] * data_pt,
knot_exponent[knot_idx_pt])

sigma = pm.HalfCauchy(name="sigma", beta=10)
outcome = pm.Normal(name="outcome",
mu=modelledresponse,
sigma=sigma,
observed=targets_pt,
shape=data_pt.shape[0],  # Important!
initval={'knot_exponent': [1, 1]})  # I found sometimes the exponents blow up

import pymc.sampling_jax as jax
trace = jax.sample_blackjax_nuts(1000,
tune=1000,
target_accept=0.97,
cores=2,
random_seed=0,
idata_kwargs={"log_likelihood": True},
return_inferencedata=True)
# idata = pm.sample()

with model:
# idata = pm.sample_posterior_predictive(trace, extend_inferencedata=False)
pm.set_data({'input': df_test.data, 'knot_idx': df_test.knot_idx})
idata = pm.sample_posterior_predictive(trace, var_names=[x for x in trace.posterior],
extend_inferencedata=False, predictions=True)
sortedbasedonindex = idata.predictions["knot_coef"].isel(knot=df_test.knot_idx.to_numpy())
print("_")

if __name__ == '__main__':
main()
``````

Accessing idata.predictions.knot_exponent.data we see that we sampled values from the range [0.007, 4.66] even though the beta distribution solely has support on [0, 1]. The parameter estimates for the posterior predictive distribution is far off from the normal posterior, why is that? i must have missed something very basic in my understanding of bayesian inference.

I think youâ€™re misusing sample_posterior_predictive. This function doesnâ€™t just generate draws from the posterior. Instead ,it generates draws from a target variable by replacing all other variables on which it depends with their posteriors. See this post for an explanation.

Also it looks like `knot_exponent` has a Gamma prior, not a Beta prior, so the reported range isnâ€™t unexpected?

ah i see, you are absolutely correct.

The reason i am asking is since i want to map up posteriors over an horizon/testdata with the use of pm.Deterministic, look at the code below:

``````import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
import pandas as pd
import pytensor.tensor as tt

def generate_input_data(timerange):
import random
return np.array([random.uniform(0.5, 0.7) for x in range(len(timerange))]).reshape(-1, 1)

def get_coefficient_val(date, coefficientknots):
coefficientval = 0
for knot, val in coefficientknots.items():
lb, ub = map(int, knot.split("-"))
if lb <= date.day <= ub:
coefficientval = val
break
return coefficientval

def get_exponent_val(date, exponentknots):
exponentval = 1.0
for exponentknot, val in exponentknots.items():
lb, ub = map(int, exponentknot.split("-"))
if lb <= date.day <= ub:
exponentval = val
break
return exponentval

def compute_true_values(timerange, coefficientknots, exponentknots):
true_values = []
for idx, date in enumerate(timerange):
coefficientval = get_coefficient_val(date, coefficientknots)
exponentval = get_exponent_val(date, exponentknots)
if coefficientval == 0 or exponentval == 1.0:
raise ValueError("Invalid coefficient or exponent value")
val = generate_input_data([date])[0]
entry = coefficientval * val ** exponentval
true_values.append(entry)
return np.array(true_values)

def main():

timerange = pd.date_range(start="2023-01-01", end="2023-07-24", freq='D')
coefficientknots = {'0-15': 2.0, '15-32': 3.0}
exponentknots = {'0-15': 0.2, '15-32': 0.3}
data = generate_input_data(timerange)
true_values = compute_true_values(timerange, coefficientknots, exponentknots)

# Gather stuff into a dataframe for organization
df = pd.DataFrame(np.c_[data, true_values], index=timerange, columns=['data', 'targets'])
df['day'] = df.index.day

# Create the map by assiging a bin index to each datapoint based on which day it falls into
knot_cutpoints = [0, 15, 32]
df['knot_idx'] = pd.cut(df['day'], knot_cutpoints, labels=False)

# Train-test split includes the knot mapping
df_train = df.iloc[:140].copy()
df_test = df.iloc[140:].copy()

# These names will be used to make the model output look nicer
knot_names = [f'{a}-{b}' for a, b in zip(knot_cutpoints[:-1], knot_cutpoints[1:])]

# Coords let us refer to dimension by their name, rather than having to remember their shapes
coords = {'knot': knot_names}

with pm.Model(coords=coords) as model:
data_pt = pm.MutableData('input', df_train.data)
targets_pt = pm.MutableData(f'convresponse', df_train.targets)

# Here is the mapping; when we do out-of-sample prediction we'll update this as well
knot_idx_pt = pm.MutableData('knot_idx', df_train.knot_idx)

# Using coords, draw all the parameters for all knots at once
knot_coef = pm.Normal(f'knot_coef', 0, 1, dims=['knot'])
random_coef = pm.Normal(f'random_coef', 0, 1)

pm.Deterministic(name="coefficients", var=random_coef * knot_coef[knot_idx_pt])
knot_exponent = pm.Gamma(f'knot_exponent', 3, 3, dims=['knot'])

# Use the mapping defined above to assign the correct coefficient to the
# correct observation
modelledresponse = tt.power(random_coef * knot_coef[knot_idx_pt] * data_pt,
knot_exponent[knot_idx_pt])

sigma = pm.HalfCauchy(name="sigma", beta=10)
outcome = pm.Normal(name="outcome",
mu=modelledresponse,
sigma=sigma,
observed=targets_pt,
shape=data_pt.shape[0],  # Important!
initval={'knot_exponent': [1, 1]})  # I found sometimes the exponents blow up

import pymc.sampling_jax as jax
trace = jax.sample_blackjax_nuts(1000,
tune=1000,
target_accept=0.97,
cores=2,
random_seed=0,
idata_kwargs={"log_likelihood": True},
return_inferencedata=True)
# idata = pm.sample()

with model:
# idata = pm.sample_posterior_predictive(trace, extend_inferencedata=False)
pm.set_data({'input': df_test.data, 'knot_idx': df_test.knot_idx})
idata = pm.sample_posterior_predictive(trace, var_names=[x for x in trace.posterior],
extend_inferencedata=False, predictions=True)
sortedbasedonindex = trace.posterior["knot_exponent"].isel(knot=df_test.knot_idx.to_numpy())

sortedbasedonnothing = trace.posterior["coefficients"]  # ?
print("_")

if __name__ == '__main__':
main()
``````

sortedbasedonindex returns the right thing but for the â€śdeterministic variableâ€ť( â€ścoefficientsâ€ť) i dont know how to do it. I want to do the same thing i did with sortedbasedonindex but with sortedbasedonnothing.
This is obviously just an toy-example but it mimics my real life scenario where i have some interactions that i wrap in Deterministic variables, but i cant figure out how to project them over an horizon/testdata such that it gets mapped up correctly.

With the posterior_predictive it was really simple but with the plain posterior i dont see any other way to do it than to map it up manually?

You may only want to only include the deterministics in the `var_names` kwarg for `sample_posterior_predictive` not every variable in the trace. Something like `var_names=[det.name for det in model.deterministics]`

Why not add dims to `coefficients`?

also thought about it but i have issues trying to understand how that would turn out, lets say i have several such knotmappings as the one above but for different intervals, i multiply the corresponding RVâ€™s together based on the indexes and place them in a deterministic variable. I understand how a single knot e.g can be mapped up on the testdata but several ones, i cant index all those knots together in an array?

``````        knot_coef = pm.Normal(f'knot_coef', 0, 1, dims=['knot'])
random_coef = pm.Normal(f'random_coef', 0, 1)
knot_coef_2 = pm.Normal(f'knot_coef_2', 0, 1, dims=["different than knot-coords"])
pm.Deterministic(name="coefficients",
var=random_coef * knot_coef[knot_idx_pt] * knot_coef_2[knot_idx_2_pt])
``````

I dont understand how i can set a dim on the deterministic in this caseâ€¦ maybe i should just go under the hood and find out but it will take quite a while i assume

The deterministic will end up with the dataâ€™s index (that is, the DateTimeIndex). Iâ€™d add a new `coords_mutable` with the dates, then put those dates everywhere where they belong (data, targets, coefficients, and modeled_response).

When you switch to the test data you can also switch the dates. Then just select whatever you want by selecting on dates.

Also donâ€™t ignore @ricardoV94 's advice to add the deterministics to `var_names` when you `sample_posterior_predictive`. That will work as you expect after you switch the data.

i must say i am lost; i cant see how it can now which variables to map up where just based on the datetimeindex?
i dont want to sample from the posterior predictive but instead map up the posterior for the Deterministic over the testdata.

If the deterministic itself is composed of several different mappings towards days, months, maybe day of the week. How can the datetimeindex coords and dims map that up over the testdata, sounds like magic under the hood?

``````sortedbasedonindex = trace.posterior["knot_exponent"].isel(knot=df_test.knot_idx.to_numpy())
coefficientsonindex = trace.posterior["coefficients"].isel(knot=df_test.date.to_numpy())
``````

The first line gets me precisely what i want but the second one, since coefficients consists of several different mappings over months, days, maybe day of the week how can a single datetime indexing sort that out, the only way i can figure out how to do it is to manually go through each and every one mapping and then multiply them together as i did with the deterministic variable?

Or can one really do something like this:

``````import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
import pandas as pd
import pytensor.tensor as tt

def generate_input_data(timerange):
import random
return np.array([random.uniform(0.5, 0.7) for x in range(len(timerange))]).reshape(-1, 1)

def get_coefficient_val(date, coefficientknots):
coefficientval = 0
for knot, val in coefficientknots.items():
lb, ub = map(int, knot.split("-"))
if lb <= date.day <= ub:
coefficientval = val
break
return coefficientval

def get_exponent_val(date, exponentknots):
exponentval = 1.0
for exponentknot, val in exponentknots.items():
lb, ub = map(int, exponentknot.split("-"))
if lb <= date.day <= ub:
exponentval = val
break
return exponentval

def compute_true_values(timerange, coefficientknots, exponentknots):
true_values = []
for idx, date in enumerate(timerange):
coefficientval = get_coefficient_val(date, coefficientknots)
exponentval = get_exponent_val(date, exponentknots)
if coefficientval == 0 or exponentval == 1.0:
raise ValueError("Invalid coefficient or exponent value")
val = generate_input_data([date])[0]
entry = coefficientval * val ** exponentval
true_values.append(entry)
return np.array(true_values)

def main():

timerange = pd.date_range(start="2023-01-01", end="2023-07-24", freq='D')
coefficientknots = {'0-15': 2.0, '15-32': 3.0}
exponentknots = {'0-15': 0.2, '15-32': 0.3}
data = generate_input_data(timerange)
true_values = compute_true_values(timerange, coefficientknots, exponentknots)

# Gather stuff into a dataframe for organization
df = pd.DataFrame(np.c_[data, true_values], index=timerange, columns=['data', 'targets'])
df['day'] = df.index.day
df['month'] = df.index.month
df['date'] = df.index.to_numpy()
# Create the map by assiging a bin index to each datapoint based on which day it falls into
knot_cutpoints = [0, 15, 32]
df['knot_idx'] = pd.cut(df['day'], knot_cutpoints, labels=False)

knot_cutpoints_2 = [0, 4, 7, 12]
df['knot_idx_2'] = pd.cut(df['month'], knot_cutpoints_2, labels=False)
# Train-test split includes the knot mapping
df_train = df.iloc[:140].copy()
df_test = df.iloc[140:].copy()

# These names will be used to make the model output look nicer
knot_names = [f'{a}-{b}' for a, b in zip(knot_cutpoints[:-1], knot_cutpoints[1:])]

knot_names_2 = [f'{a}-{b}' for a, b in zip(knot_cutpoints_2[:-1], knot_cutpoints_2[1:])]
# Coords let us refer to dimension by their name, rather than having to remember their shapes
coords = {'knot': knot_names,
'knot2': knot_names_2,
'date': df_train.index}

with pm.Model(coords=coords) as model:
data_pt = pm.MutableData('input', df_train.data)
targets_pt = pm.MutableData(f'convresponse', df_train.targets)

# Here is the mapping; when we do out-of-sample prediction we'll update this as well
knot_idx_pt = pm.MutableData('knot_idx', df_train.knot_idx)

knot_idx_pt_2 = pm.MutableData('knot_idx_2', df_train.knot_idx_2)
# Using coords, draw all the parameters for all knots at once
knot_coef = pm.Normal(f'knot_coef', 0, 1, dims=['knot'])
random_coef = pm.Normal(f'random_coef', 0, 1)
random_coef_2 = pm.Normal(f'random_coef_2', 0, 1, dims=['knot2'])
# knot_coef_2 = pm.Normal(f'knot_coef_2', 0, 1, dims=["different than knot-coords"])
pm.Deterministic(name="coefficients",
var=random_coef * knot_coef[knot_idx_pt] * random_coef_2[knot_idx_pt_2],
dims=['date'])

knot_exponent = pm.Gamma(f'knot_exponent', 3, 3, dims=['knot'])

# Use the mapping defined above to assign the correct coefficient to the
# correct observation
modelledresponse = tt.power(random_coef * knot_coef[knot_idx_pt] * data_pt,
knot_exponent[knot_idx_pt])

sigma = pm.HalfCauchy(name="sigma", beta=10)
outcome = pm.Normal(name="outcome",
mu=modelledresponse,
sigma=sigma,
observed=targets_pt,
shape=data_pt.shape[0],  # Important!
initval={'knot_exponent': [1, 1]})  # I found sometimes the exponents blow up

import pymc.sampling_jax as jax
trace = jax.sample_blackjax_nuts(1000,
tune=1000,
target_accept=0.97,
cores=2,
random_seed=0,
idata_kwargs={"log_likelihood": True},
return_inferencedata=True)
# idata = pm.sample()

with model:
pm.set_data({'input': df_test.data, 'knot_idx': df_test.knot_idx})
sortedbasedonindex = trace.posterior["knot_exponent"].isel(knot=df_test.knot_idx.to_numpy())
coefficientsonindex = trace.posterior["coefficients"].isel(knot=df_test.date.to_numpy())

print("_")

if __name__ == '__main__':
main()
``````

that would be magic?

Maybe Iâ€™m the one who doesnâ€™t understand. Suppose the value of `coefficients` is `[1, 1, 2, 2, 3, 3, 4, 4, 5, 5]`. Recall that, because of the indexing, `coefficients` has the same shape as the data, so in my example, the data has shape `(10,)`, and each `coefficient` already matches a row of data. What is your expected output of `coefficients_on_index`?

If you just want to update it so that it matches the test values, use `pm.set_data` together with `pm.sample_posterior_predictive`, passing `coefficients` to var_names.

hold on, i might have misinterpreted your example in https://discourse.pymc.io/t/interpretation-of-posterior-predictive-checks-for-a-gaussian-process/12415/9 .

I think this example here below will really showcase what i want to get at(thanks for your patience).

Consider the code below:

``````import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
import pandas as pd
import pytensor.tensor as tt

def generate_input_data(timerange):
import random
return np.array([random.uniform(0.5, 0.7) for x in range(len(timerange))]).reshape(-1, 1)

def get_coefficient_val_month(date, coefficientknots):
coefficientval = 0
for knot, val in coefficientknots.items():
lb, ub = map(int, knot.split("-"))
if lb <= date.month <= ub:
coefficientval = val
break
return coefficientval

def get_coefficient_val(date, coefficientknots):
coefficientval = 0
for knot, val in coefficientknots.items():
lb, ub = map(int, knot.split("-"))
if lb <= date.day <= ub:
coefficientval = val
break
return coefficientval

def get_exponent_val(date, exponentknots):
exponentval = 1.0
for exponentknot, val in exponentknots.items():
lb, ub = map(int, exponentknot.split("-"))
if lb <= date.day <= ub:
exponentval = val
break
return exponentval

def compute_true_values(timerange, coefficientknots, coefficientknotsmonths, exponentknots):
true_values = []
for idx, date in enumerate(timerange):
coefficientval = get_coefficient_val(date, coefficientknots)
coefficientvalmonth = get_coefficient_val_month(date, coefficientknotsmonths)
exponentval = get_exponent_val(date, exponentknots)
if coefficientval == 0 or exponentval == 1.0:
raise ValueError("Invalid coefficient or exponent value")
val = generate_input_data([date])[0]
entry = coefficientvalmonth * coefficientval * val ** exponentval
true_values.append(entry)
return np.array(true_values)

def main():

timerange = pd.date_range(start="2023-01-01", end="2023-07-24", freq='D')
coefficientknots = {'0-15': 2.0, '15-32': 3.0}
exponentknots = {'0-15': 0.2, '15-32': 0.3}
coefficientknotsmonths = {'0-6': 4.0, '6-13': 7.0}
data = generate_input_data(timerange)
true_values = compute_true_values(timerange, coefficientknots, coefficientknotsmonths, exponentknots)

# Gather stuff into a dataframe for organization
df = pd.DataFrame(np.c_[data, true_values], index=timerange, columns=['data', 'targets'])
df['day'] = df.index.day

df['month'] = df.index.month

# Create the map by assiging a bin index to each datapoint based on which day it falls into
knot_cutpoints = [0, 15, 32]
df['knot_idx'] = pd.cut(df['day'], knot_cutpoints, labels=False)

knot_cutpoints_month = [0, 6, 13]
df['knot_idx_month'] = pd.cut(df['month'], knot_cutpoints_month, labels=False)

# Train-test split includes the knot mapping
df_train = df.iloc[:140].copy()
df_test = df.iloc[140:].copy()

# These names will be used to make the model output look nicer
knot_names = [f'{a}-{b}' for a, b in zip(knot_cutpoints[:-1], knot_cutpoints[1:])]

knot_names_month = [f'{a}-{b}' for a, b in zip(knot_cutpoints_month[:-1], knot_cutpoints[1:])]
# Coords let us refer to dimension by their name, rather than having to remember their shapes
coords = {'knot': knot_names,
'knot_names_month': knot_names_month}

with pm.Model(coords=coords) as model:
data_pt = pm.MutableData('input', df_train.data)
targets_pt = pm.MutableData(f'convresponse', df_train.targets)

knot_idx_pt = pm.MutableData('knot_idx', df_train.knot_idx)

knot_idx_pt_month = pm.MutableData('knot_idx_month', df_train.knot_idx_month)

knot_coef = pm.Normal(f'knot_coef', 0, 1, dims=['knot'])
knot_exponent = pm.Gamma(f'knot_exponent', 3, 3, dims=['knot'])

exp = pm.Deterministic('knot_exponent_deterministic', var=knot_exponent[knot_idx_pt])

knot_coef_month = pm.Normal(f'knot_coef_month', 0, 1, dims=['knot_names_month'])

coefficients = pm.Deterministic(name="coefficients",
var=knot_coef[knot_idx_pt] * knot_coef_month[knot_idx_pt_month])

modelledresponse = tt.power(coefficients * data_pt,
exp)

sigma = pm.HalfCauchy(name="sigma", beta=10)
outcome = pm.Normal(name="outcome",
mu=modelledresponse,
sigma=sigma,
observed=targets_pt,
shape=data_pt.shape[0],  # Important!
initval={'knot_exponent': [1, 1]})  # I found sometimes the exponents blow up

import pymc.sampling_jax as jax
trace = jax.sample_blackjax_nuts(1000,
tune=1000,
target_accept=0.97,
cores=2,
random_seed=0,
idata_kwargs={"log_likelihood": True},
return_inferencedata=True)

with model:
# idata = pm.sample_posterior_predictive(trace, extend_inferencedata=False)
pm.set_data({'input': df_test.data, 'knot_idx': df_test.knot_idx, 'knot_idx_month': df_test.knot_idx_month})
idata = pm.sample_posterior_predictive(trace, random_seed=0, var_names=["outcome",
"knot_coef", "knot_exponent",
"knot_exponent_deterministic",
"coefficients"],
extend_inferencedata=False, predictions=True)
exponents = idata.predictions["knot_exponent_deterministic"]
coefficients = idata.predictions["coefficients"]

idata_new = pm.sample_posterior_predictive(trace, random_seed=0, extend_inferencedata=False,
var_names=["knot_exponent_deterministic", "coefficients"],
predictions=True)
print("-")

if __name__ == '__main__':
main()
``````

To my understanding, the variables exponents and coefficients here below will correspond to PRIOR samples and not the posterior samples used to compute the posterior predictive distribution:

``````idata = pm.sample_posterior_predictive(trace, random_seed=0, var_names=["outcome","knot_coef", "knot_exponent","knot_exponent_deterministic","coefficients"],
extend_inferencedata=False, predictions=True)
exponents = idata.predictions["knot_exponent_deterministic"]
coefficients = idata.predictions["coefficients"]
``````

and the same applies here:

``````idata_new = pm.sample_posterior_predictive(trace, random_seed=0, extend_inferencedata=False,
var_names=["knot_exponent_deterministic", "coefficients"],
predictions=True)
``````

But i might have misunderstood your example, if var_names include the outcome, then the other var_names yields the posterior samples from which it samples the posterior predictive.

If var_names does not include the outcome, then var_names will correspond to priors?

Thus including â€śoutcomeâ€ť or â€śyâ€ť or whatever your target is in var_names will automatically generate the posterior samples for the variables the target DEPENDS on. If it is not included it will generate the priors?

If that is the case we are set and to my understanding then exponents and coefficients above correspond to what i want, namely posteriors mapped over the testdata.

If it is not the case, that exponents = idata.predictions[â€śknot_exponent_deterministicâ€ť] above correspond to PRIORS i am screwed.

unfortunately i was wrongâ€¦

``````idata_new = pm.sample_posterior_predictive(trace, random_seed=0, extend_inferencedata=False,
var_names=["knot_exponent_deterministic", "coefficients"],
predictions=True)
idata_new_2 = pm.sample_posterior_predictive(trace, random_seed=0, extend_inferencedata=False,
var_names=["outcome", "knot_exponent_deterministic", "coefficients"],
predictions=True)
``````

they resulted in the same coefficients and exponentsâ€¦ so i am screwed.

Lets say testdata consists of the following dates [2023-04-01, 2023-04-02, 2023-04-03]
I have some mappings for knots and interactions between the RVâ€™s. Lets say one RV covers all dates, another one just the first of every month in my dataset, these two are multiplied in an interaction term.
I want to generate posterior samples of these RVâ€™s corresponding to the dates in the testdata without manually adding it up. Thus in this case, i want posterior samples ordered like this; the first entry correspond to the product of the two(first RV covers the first of every month and the second all dates), the second entry solely correspond to the one covering all dates, and similarly for the last entry.

Seems so strange that the sample_posterior_predictive generate PRIORS from our RVâ€™s even though it uses the posteriors to sample the outcome, i dont get itâ€¦

Posterior predictive yields forward draws conditioned on the values of other variables in the trace. Except when an RV depends on a MutableData or mutable coords whose values have changed or another RV that is being resampled. In those cases itâ€™s also resampled and the values in the trace are ignored.

Also, every RV in `var_names` is resampled (otherwise we would just be copying values from the posterior, which is not a very interesting behavior).

A variable being resampled only corresponds to the prior if no values from the trace influence its draws.

You should be able to see any variables that are being resampled in the printed lines when you call `sample_posterior_predictive`

``````with pm.Model() as m:
x = pm.Normal("x")
y = pm.Normal("y")
z = pm.Normal("z", x + y**2)
det = pm.Deterministic("det", exp(z))
obs = pm.Normal("obs", det, 1, observed=data)

idata = pm.sample()

# Default behavior
pm.sample_posterior_predictive(idata, var_names=["obs"])
# Sampling: [obs]

# Deterministic mapping
pm.sample_posterior_predictive(idata, var_names=["det"])
# Sampling: []

# Posterior predictive of z and new det given posterior of x and y
# This corresponds to the default behavior except z is not an observed variable
pm.sample_posterior_predictive(idata, var_names=["z", "det"])
# Sampling: [z]

# z is resampled based on the prior of y (since no other variables influence y),
# but it's still conditioned on the posterior of x
# Regarding det, it's equivalent to `var_names=["y", "z", "det"]`
pm.sample_posterior_predictive(idata, var_names=["y", "det"])
# Sampling: [y, z]

# Everything is resampled, same as prior
# Regarding det, it's equivalent to `var_names=["x", "y", "z", "det"]`
pm.sample_posterior_predictive(idata, var_names=["x", "y", "det"])
# Sampling: [x, y, z]
``````
2 Likes

this means that in my case i should just be able to wrap the variables i am interested in â€śprojectingâ€ť in an deterministic variable and then sample from the posterior predictive solely for these deterministic variables as i understand it.

However, i still dont get it:
I renamed coefficients to coefficients_deterministic just to make it clear:

``````        trace = jax.sample_blackjax_nuts(1000,
tune=1000,
target_accept=0.97,
cores=2,
random_seed=0,
idata_kwargs={"log_likelihood": True},
return_inferencedata=True)

with model:
# idata = pm.sample_posterior_predictive(trace, extend_inferencedata=False)
pm.set_data({'input': df_test.data, 'knot_idx': df_test.knot_idx, 'knot_idx_month': df_test.knot_idx_month})
idata_new = pm.sample_posterior_predictive(trace, random_seed=0, extend_inferencedata=False,
var_names=["knot_exponent_deterministic",
"coefficients_deterministic"],
predictions=True)
coeff_deterministic = idata_new.predictions.coefficients_deterministic
knot_exponent_deterministic = idata_new.predictions.knot_exponent_deterministic
``````

knot_exponent_deterministic does indeed correspond to the posterior samples and hopefully mapped up correctly w.r.t dates.

However, coeff_deterministic does not correspond to the posterior samples(min values are not the same but the max values are) and to me this doesnt make sense since you essentially did the exact same thing in your example for pm.sample_posterior_predictive(idata, var_names=[â€śdetâ€ť]) , it is more or less the exact same setup, what am i getting wrong?

Whatâ€™s the printed output: `Sampling: [?]`? when you call posterior predictive?

Edit: I didnâ€™t check your model structure carefully but remember that any RV that depends on a MutableData whose values have changed is also resampled.

Edit2: I checked now and it seems like thatâ€™s not your case.

I tried to recreate your example in Colab and `coefficients_deterministic` seem to map just fine. To compare I choose the cases where the `knot_idx == 0` and `knot_idx_month == 0` in both the train and test dataset.

I did this because you donâ€™t seem to have `knot_idx_month==1` in the train dataset, which is probably a bug? That means, that without indexing them away, there would be values in the predictions deterministic that you never saw in the posterior deterministic (itâ€™s a new combination after all).

3 Likes

you are absolutely correct, lovely. The deterministic-variable solved itâ€¦ it is really convenient to be able to map it up like this over an horizon.

do you have any idea on how to approach this problem: Using dims and MutableData to index non-existing/zero-value variables in splines ?

be careful of the dummy variable trap: see Using dims and MutableData to index non-existing/zero-value variables in splines