Hello everyone,
I am a beginner in PyMC3 and I can’t understand how to use it. Here in this example I am trying to find
the distribution of just one variable using PyMC3 but I can’t. Could you please look at my model and my interpretation of it and tell me where I am going wrong:
data = (np.random.randn(20)+15)*2
with pm.Model() as model:
mean = pm.Uniform('mean',lower=0, upper=35)
std = pm.Uniform('std', lower=0, upper=5 )
X = pm.Normal('X', mu=mean, sd=std, observed = data)
n_samples = 10000
approx = pm.fit(n = n_samples, method = pm.ADVI())
ndraws = 1000
trace = approx.sample(draws = ndraws)
ndraws2 = 100
samples = pm.sample_posterior_predictive(trace, var_names=['X','std','mean'], model=model, size=ndraws2)
samples = samples['X']
avg = np.mean(samples[200:],axis=0)
plt.hist(avg)
plt.hist(data)
result:
Here is what I think I am telling PyMC3:
- I have some data. I don’t know the distribution of the data but I suspect it might be Gaussian (Hence X = pm.Normal(…)).
- I have no idea what the mean and standard deviation of the data is, but they are probably between 0 and 35, and, 0 and 5, respectively. (Hence mean = pm.Uniform(‘mean’,lower=0, upper=35) and std = pm.Uniform(‘std’, lower=0, upper=5 ))
- Find the distribution of my data (Hence approx = pm.fit(n = n_samples, method = pm.ADVI()))
- Give me 1000 samples for what you think the values of ‘mean’, ‘std’ might be ( ndraws = 1000, trace = approx.sample(draws = ndraws))
- Using each of these 1000 values for ‘mean’ and ‘std’ produce 100 values for ‘X’ (ndraws2 = 100
samples = pm.sample_posterior_predictive(trace, var_names=[‘X’], model=model, size=ndraws2) )
I am expecting the two histograms to be the almost the same, why aren’t they? The mean seems to be matching but the variance is very off.