Data container question

It seems like using the newish Data class only works for observed data which is exactly the same shape? It could be useful to be able to use the data class to re-use a model with the same form of observed data, but perhaps we have different numbers of observations.

Is this doable using the Data container class? If not, is there a recommended way of doing this - or should we fall back on re-building models?

Example of issue (Adapted from docs https://docs.pymc.io/notebooks/data_container.html)

%matplotlib inline
import numpy as np
import pandas as pd
import pymc3 as pm
import arviz as az

true_mu = 30
observed_data = true_mu + np.random.randn(10)

with pm.Model() as model:
    data = pm.Data('data', observed_data)
    mu = pm.Normal('mu', 0, 10)
    pm.Normal('y', mu=mu, sigma=1, observed=data)
    trace = pm.sample()

We can use the same model as long as the observed data is exactly the same shape

true_mu = 10
observed_data = true_mu + np.random.randn(10)

with model:
    pm.set_data({'data': observed_data})
    trace = pm.sample()
    
az.plot_trace(trace);

But not if we have a different number of observations

true_mu = 10
observed_data = true_mu + np.random.randn(50)

with model:
    pm.set_data({'data': observed_data})
    trace = pm.sample()
    
az.plot_trace(trace);

Error:

ValueError: Elemwise{sub,no_inplace}.grad returned object of shape (10,) as gradient term on input 0 of shape (50,)
1 Like

Currently it is not possible to sample posterior again using data with different shape, for example, see this issue (the first comments are relevant, it eventually changed topic).

What is possible is to use pm.Data to sample posterior predictive using different data. This feature seems to usually work with some exceptions (e.q. https://github.com/pymc-devs/pymc3/issues/3640)

2 Likes