Data container question

drbenvincent · February 26, 2020, 11:24am

It seems like using the newish Data class only works for observed data which is exactly the same shape? It could be useful to be able to use the data class to re-use a model with the same form of observed data, but perhaps we have different numbers of observations.

Is this doable using the Data container class? If not, is there a recommended way of doing this - or should we fall back on re-building models?

Example of issue (Adapted from docs https://docs.pymc.io/notebooks/data_container.html)

%matplotlib inline
import numpy as np
import pandas as pd
import pymc3 as pm
import arviz as az

true_mu = 30
observed_data = true_mu + np.random.randn(10)

with pm.Model() as model:
    data = pm.Data('data', observed_data)
    mu = pm.Normal('mu', 0, 10)
    pm.Normal('y', mu=mu, sigma=1, observed=data)
    trace = pm.sample()

We can use the same model as long as the observed data is exactly the same shape

true_mu = 10
observed_data = true_mu + np.random.randn(10)

with model:
    pm.set_data({'data': observed_data})
    trace = pm.sample()
    
az.plot_trace(trace);

But not if we have a different number of observations

true_mu = 10
observed_data = true_mu + np.random.randn(50)

with model:
    pm.set_data({'data': observed_data})
    trace = pm.sample()
    
az.plot_trace(trace);

Error:

ValueError: Elemwise{sub,no_inplace}.grad returned object of shape (10,) as gradient term on input 0 of shape (50,)

OriolAbril · February 26, 2020, 7:13pm

Currently it is not possible to sample posterior again using data with different shape, for example, see this issue (the first comments are relevant, it eventually changed topic).

What is possible is to use pm.Data to sample posterior predictive using different data. This feature seems to usually work with some exceptions (e.q. https://github.com/pymc-devs/pymc3/issues/3640)

Topic		Replies	Views
Getting the same prediction when using the PyMC3 data container to generate Bayesian regression prediction using new data Questions theano , modeling	3	507	December 10, 2022
Data containers, heirarchical models and minibatches Questions	2	587	April 17, 2020
Use distributions for new input in Data containers Questions modeling	1	530	February 17, 2022
Rerun model with different data Questions	1	280	November 1, 2021
Sample_posterior_predicitve not catching shape of new data v5 prediction	10	1258	August 24, 2022

Data container question

Related topics