# Apparent Posterior Bias With Increased Number of Data Points

I am new to PyMC and have run into an odd issue in which the more data points I get, the worse of an answer I get.

I have been following this example of modeling a Gaussian distribution with an unknown standard deviation (see especially equations 1 and 6). I create my sample by drawing N~10 data points from a normal distribution with standard deviation 2.5 with numpy.random.normal().

To estimate the standard deviation, I use a Poisson likelihood with mean equal to the value of a normal distribution with a given standard deviation.

``````true_sigma = 2.5
num = 50  # Length of dataset
vs = np.random.normal(0, true_sigma, num)  # Create normal distribution with standard deviation of 2.5
counts = np.ones_like(vs)
with Model() as model:
# Define prior
sigma = Uniform('sigma', lower=0, upper=20)

# Define likelihood
likelihood = Poisson('y', 1/sigma * np.exp((-(vs)**2)/(2 * sigma**2)), observed=counts)

idata = sample(10000, tune=5000)
``````

When I run PyMC on this, the posterior for the standard deviation is very close to the true value (2.5) used to create the data.

When I increase the number of data points (N=500, 5000), the posteriors become increasingly offset from the true value until eventually the correct number is completely ruled out.
These are the resulting posteriors for different lengths of data.:

What is happening here?

Welcome!

I didn’t read through the source that closely, but why Poisson? If you just want to sample from an arbitrary likelihood, you can use `pm.Potential()`, but it really depends on what you are trying to do.

Could you explain a bit more about why `1/sigma * np.exp((-(vs)**2)/(2 * sigma**2))` is supplied as the location parameter for the Poisson here? I see that it’s a Gaussian density, but I don’t understand how it’s being used.