Difficulty plotting multi-switchpoint posterior distributions

Hi,

I am trying to plot the multi-switchpoint posterior distributions, but the graph output looks different from the posterior distributions, can anyone tell me what I am doing wrong, please?

The results of the posterior distributions are:

And my code for the plot is below

early_rate_samples = trace['lambda_1']
late_rate_samples = trace['lambda_2']
later_rate_samples = trace['lambda_3']
changepoint_samples = trace['tau_1']
changepoint_2_samples = trace['tau_2']

#N samples from the corresponding posterior distribution.
N = ((changepoint_samples.shape[0]) + (changepoint_2_samples.shape[0]))

expected_Outbreaks_per_week = np.zeros(number_of_weeks)
for week in range(0, number_of_weeks):
    # ix is a bool index of all changepoint samples corresponding to
    # the changepoint occurring prior to value of 'week'
    ix = (week <= changepoint_samples) & (changepoint_2_samples >= week)
    expected_Outbreaks_per_week[week] = (early_rate_samples[ix].sum() + late_rate_samples[ix].sum() + later_rate_samples[~ix].sum()) / N

plt.plot(range(number_of_weeks), expected_Outbreaks_per_week, lw=4, color="#E24A33",
         label="expected number of Outbreak incident")
plt.xlim(0, number_of_weeks)
plt.xlabel("Time (weeks)")
plt.ylabel("Expected # Outbreaks")
plt.title("Expected number of Outbreak Incident")
plt.ylim(0, 15)
plt.bar(np.arange(len(twitter_df['twitter_stream_count'])), twitter_df['twitter_stream_count'], color="#348ABD", alpha=0.65,
        label="observed Outbreak per week")

plt.legend(loc="upper left");

The graph below plotted from the code looks completely way off the posterior distribution outputs.

image

Can you tell me what I am doing wrong here, please?

Thanks

Can you share the output of the corresponding InferenceData? Without this nor the model it’s difficult to make sense of some of the code like the definition of N.

It also looks like ix is not doing what you want it to do. ix indicates if the week is between the two changepoints only, and you are using both early rate and late rate when it’s true, later rate when it’s false. It should probably be something like:

early_mask = (week <= changepoint_samples)
later_mask = (week >= changepoint_2_samples)
late_mask = ~(early_mask | later_mask)  # equal to (week > changepoint_samples) & (week < changepoint_2_samples)
expected_Outbreaks_per_week[week] = (early_rate_samples[early_mask].sum() + late_rate_samples[late_mask].sum() + later_rate_samples[later_mask].sum()) / N

Thank you for your reply.

The model is as follows

with pm.Model() as twitter_model:

    alpha = pm.Uniform('alpha', lower=0, upper=100)
    #alpha = pm.Gamma('alpha', mu=50, sigma=20)

    #lambda_1 = pm.Exponential("lambda_1", alpha)
    #lambda_2 = pm.Exponential("lambda_2", alpha)
    #lambda_3 = pm.Exponential("lambda_3", alpha)
    
    lambda_1 = pm.Uniform("lambda_1", lower=0, upper=20)
    lambda_2 = pm.Uniform("lambda_2", lower=0, upper=20)
    lambda_3 = pm.Uniform("lambda_3", lower=0, upper=20)

    tau_1 = pm.Uniform("tau_1", lower=0, upper= number_of_weeks - 1)
    tau_2 = pm.Uniform("tau_2", lower=tau_1, upper= number_of_weeks - 1)

    idx = np.arange(number_of_weeks) # Index
    
    lamda_0 = pm.math.switch(tau_1 >= idx, lambda_1, lambda_2)
    lambda_ = pm.math.switch(tau_2 >= idx, lamda_0, lambda_3)

    observation = pm.NegativeBinomial("obs", lambda_, alpha, observed=twitter_obs)
with twitter_model:

 trace = pm.sample(1000, cores=2, tune=2000)
 ````

````
Sampling 2 chains for 2_000 tune and 1_000 draw iterations (4_000 + 2_000 draws total) took 475 seconds.
The estimated number of effective samples is smaller than 200 for some parameters.
````



Updating the plot with your suggestion gives a reasonable plot, this is exactly what I am looking for. Thanks.

````
#N samples from the corresponding posterior distribution.

N = (changepoint_samples.shape[0]) 

expected_Outbreaks_per_week = np.zeros(number_of_weeks)

for week in range(0, number_of_weeks):

    # ix is a bool index of all changepoint samples corresponding to

    # the changepoint occurring prior to value of 'week'

    #ix = (week <= changepoint_samples) & (changepoint_2_samples >= week)

    early_mask = (week <= changepoint_samples)

    later_mask = (week >= changepoint_2_samples)

    late_mask = ~(early_mask | later_mask)  # equal to (week > changepoint_samples) & (week < changepoint_2_samples)

    expected_Outbreaks_per_week[week] = (early_rate_samples[early_mask].sum() + late_rate_samples[late_mask].sum() + later_rate_samples[later_mask].sum()) / N

plt.plot(range(number_of_weeks), expected_Outbreaks_per_week, lw=4, color="#E24A33",

         label="expected number of Outbreak incident")

plt.xlim(0, number_of_weeks)

plt.xlabel("Time (weeks)")

plt.ylabel("Expected # Outbreaks")

plt.title("Expected number of Outbreak Incident")

plt.ylim(0, 15)

plt.bar(np.arange(len(twitter_df['twitter_stream_count'])), twitter_df['twitter_stream_count'], color="#348ABD", alpha=0.65,

        label="observed Outbreak per week")

plt.legend(loc="upper left")


`

Can you use 3 backticks (`) to format the code? You can use all markdown and some extra features here, using > for quoting content doesn’t preserve the indentation so it’s difficult to read python code

Thanks, I have updated the post.

How can one test model effectiveness in PyMC3, similar to frequentist approach such as Mean Absolute Error(MAE), Root Mean Square Error (RMSE), AUC plot, and Goodness-of-fit, which are available for testing regression models?