Is there a way to generate synthetic data

paul_cox · July 14, 2022, 9:13pm

Is there a way to generate synthetic data sets using PYMC where the synthetic data would capture the correct relationships and distributions of the model and the original data set? thanks. Paul

jessegrabowski · July 15, 2022, 1:41am

Hi Paul!

pm.sample_posterior_predictive generates artificial (observed) data, given the model and covariates; this might be what you want? There’s an example notebook showing how they are used here

cluhmann · July 15, 2022, 1:59am

Welcome!

I’m not sure what the “correct” relationships/distribution of both the model and the data might be. You can generate posterior predictive samples, which are draws from a posterior used to generate credible (synthetic) data. Or, if you want to investigate the dependencies among model parameters (ignoring observed data), you can sample from your model without including any of the observed variables. That should yield a MCMC trace that includes draws from your posterior that are then pushed through the rest of your model:

with pm.Model() as model:
    a = pm.Gamma("a", alpha=1, beta=1)
    b = pm.Normal("b", mu=a, sigma=1)
    c = pm.StudentT("c", mu=b, sigma=1, nu=3)
    
    idata = pm.sample(10)
    print(idata.posterior)

yields:

Coordinates:
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    b        (chain, draw) float64 1.368 1.93 1.213 3.031 ... 1.81 0.6625 0.2195
    c        (chain, draw) float64 3.798 3.018 4.84 5.91 ... 1.844 2.292 -0.6863
    a        (chain, draw) float64 0.4783 0.5831 0.8971 ... 0.2333 0.7063 0.4604
Attributes:
    created_at:                 2022-07-15T01:55:13.283567
    arviz_version:              0.12.1
    inference_library:          pymc
    inference_library_version:  4.1.2
    sampling_time:              0.702103853225708
    tuning_steps:               1000

Are either of those what you are looking for?

paul_cox · July 15, 2022, 12:34pm

thank you
Paul

paul_cox · July 15, 2022, 12:34pm

Yes, this is what I was looking for.

thanks

Paul

Topic		Replies	Views
Bayesian Data Production Operations version agnostic modeling	2	363	October 30, 2022
How to correctly fit a bayesian network to data and generate out-of-sample predictions v5 modeling	0	119	July 5, 2024
Generating censored data from posterior distribution	4	232	June 10, 2024
Simulating Fake Data from pymc3 model Questions	1	1767	October 28, 2017
How to use the posterior predictive distribution for checking a model from PyMC version agnostic arviz , model-checking	10	4181	March 14, 2023

Is there a way to generate synthetic data

Related topics