Apparent Posterior Bias With Increased Number of Data Points

CayenneMatt · November 18, 2022, 8:40pm

I am new to PyMC and have run into an odd issue in which the more data points I get, the worse of an answer I get.

I have been following this example of modeling a Gaussian distribution with an unknown standard deviation (see especially equations 1 and 6). I create my sample by drawing N~10 data points from a normal distribution with standard deviation 2.5 with numpy.random.normal().

To estimate the standard deviation, I use a Poisson likelihood with mean equal to the value of a normal distribution with a given standard deviation.

true_sigma = 2.5
num = 50  # Length of dataset
vs = np.random.normal(0, true_sigma, num)  # Create normal distribution with standard deviation of 2.5
counts = np.ones_like(vs)
with Model() as model:
    # Define prior
    sigma = Uniform('sigma', lower=0, upper=20)

    # Define likelihood
    likelihood = Poisson('y', 1/sigma * np.exp((-(vs)**2)/(2 * sigma**2)), observed=counts)
   
    idata = sample(10000, tune=5000)

When I run PyMC on this, the posterior for the standard deviation is very close to the true value (2.5) used to create the data.

When I increase the number of data points (N=500, 5000), the posteriors become increasingly offset from the true value until eventually the correct number is completely ruled out.
These are the resulting posteriors for different lengths of data.:

What is happening here?

cluhmann · November 19, 2022, 10:54pm

Welcome!

I didn’t read through the source that closely, but why Poisson? If you just want to sample from an arbitrary likelihood, you can use pm.Potential(), but it really depends on what you are trying to do.

ckrapu · November 22, 2022, 3:32am

Could you explain a bit more about why 1/sigma * np.exp((-(vs)**2)/(2 * sigma**2)) is supplied as the location parameter for the Poisson here? I see that it’s a Gaussian density, but I don’t understand how it’s being used.

Topic		Replies	Views
The standard deviation of posterior distribution remains the same compared to prior v5 modeling	16	702	January 21, 2024
Building a normal likelihood with known but varying variance for every data point/sample Questions	1	397	February 20, 2021
Issue getting started with simple example Questions	1	716	November 18, 2018
Unexpected sampling behaviour Questions	3	553	February 14, 2020
Beginning user, problem set correctly (experimental data fit)? v5 modeling	2	327	January 16, 2023

Apparent Posterior Bias With Increased Number of Data Points

Related topics