How to interpret a very basic PyMC3 model?

vtac · November 13, 2021, 5:58pm

Hello everyone,
I am a beginner in PyMC3 and I can’t understand how to use it. Here in this example I am trying to find
the distribution of just one variable using PyMC3 but I can’t. Could you please look at my model and my interpretation of it and tell me where I am going wrong:

data = (np.random.randn(20)+15)*2
with pm.Model() as model:
  mean = pm.Uniform('mean',lower=0, upper=35)
  std  = pm.Uniform('std', lower=0, upper=5 )
  X = pm.Normal('X', mu=mean, sd=std, observed = data)
  n_samples = 10000
  approx = pm.fit(n = n_samples, method = pm.ADVI())
ndraws = 1000
trace = approx.sample(draws = ndraws)
ndraws2 = 100
samples = pm.sample_posterior_predictive(trace, var_names=['X','std','mean'], model=model, size=ndraws2) 
samples = samples['X']
avg = np.mean(samples[200:],axis=0)
plt.hist(avg)
plt.hist(data)

result:

Here is what I think I am telling PyMC3:

I have some data. I don’t know the distribution of the data but I suspect it might be Gaussian (Hence X = pm.Normal(…)).
I have no idea what the mean and standard deviation of the data is, but they are probably between 0 and 35, and, 0 and 5, respectively. (Hence mean = pm.Uniform(‘mean’,lower=0, upper=35) and std = pm.Uniform(‘std’, lower=0, upper=5 ))
Find the distribution of my data (Hence approx = pm.fit(n = n_samples, method = pm.ADVI()))
Give me 1000 samples for what you think the values of ‘mean’, ‘std’ might be ( ndraws = 1000, trace = approx.sample(draws = ndraws))
Using each of these 1000 values for ‘mean’ and ‘std’ produce 100 values for ‘X’ (ndraws2 = 100
samples = pm.sample_posterior_predictive(trace, var_names=[‘X’], model=model, size=ndraws2) )

I am expecting the two histograms to be the almost the same, why aren’t they? The mean seems to be matching but the variance is very off.

falk · November 15, 2021, 9:40am

Hi @vtac!
I think the problem lies in understanding what the sample_posterior_predictive gives you.
As far as I see it, you only get the “mean” from it, and cannot predict what goes to the sd keyword of your X. Considering this, results don’t look bad at all. Think of the narrower histogram as the “standard error of the mean”, though statistical rigor police might arrest me for drawing that analogy.

Besides that, the samples['X'] you get don’t have to be averaged. They are a numpy array and have a certain shape: one dimension equals the number of observations, another one the number of chains… This can be quite confusing, and I think a rework is under way for pymc 4; I tend to just flatten out the predictions (.ravel()) but that might be too short-sighted.

Hope this helps!

michaelosthege · November 15, 2021, 9:56am

Unless you know what you’re doing, I recommend to stick with MCMC sampling instead of ADVI.

I’m not sure if I understood what you’re trying to achieve, but how about this?

import numpy as np
import pymc3 as pm
import arviz as az

data = (np.random.randn(20)+15)*2
with pm.Model() as model:
    data_mean = pm.Uniform("data_mean", lower=0, upper=35)
    data_std  = pm.Uniform("data_std", lower=0, upper=5)
    X = pm.Normal("X", mu=data_mean, sd=data_std, observed=data)

    # draw posterior samples by MCMC
    idata = pm.sample(return_inferencedata=True)
az.plot_forest(idata);

The forestplot shows each chain separately. (I had 3 chains.)

az.summary(idata)

The ArviZ summary gives you a highest density interval (HDI) which is the narrowest credible interval.
So with this, the mean lies in the interval [28.810, 30.917] with a 94 % probability.
Likewise, the std lies in the interval [1.773, 3.323] with a 94 % probability.

Topic		Replies	Views
Mean of mu and mean of predictive posterior distribution Questions linear_model	2	1476	December 11, 2021
The 95% interval of PyMC3 trace doesn't cove the real values Questions	8	1837	July 21, 2017
Issue getting started with simple example Questions	1	717	November 18, 2018
Regression predicted values in pymc version agnostic modeling , arviz	6	722	April 13, 2023
Seeds: Random effect logistic regression Questions	31	1303	March 29, 2022

How to interpret a very basic PyMC3 model?

Related topics