The last post in the linked thread shows how to use pm.MutableData
to do out-of-sample predictions in a BART model, and I don’t see any of that in the code block you just posted here. Am I missing something?
Yes - that i often confuse myself to a degree were I am on the right track and after this getting lost again.
Now if I make it mutable and put the shape[0] to observed and leave out y after the beginning were it is requested it looks like this:
with pm.Model() as model:
# Define mutable data
x_data = pm.MutableData("x_data", x_data)
# Define prior for mu
mu = pmb.BART("mu", X= x_data, Y=y_data)
# Define likelihood
y_pred = pm.Poisson("y_pred", mu=mu, observed=X.shape[0])
# Sampling
idata = pm.sample(random_seed=RANDOM_SEED)
# Generate new data
new_X = generate_new_data(...)
# Use the existing model to generate posterior predictive samples for the new data
with model:
pm.set_data({"data_X": new_X})
ppc = pm.sample_posterior_predictive(idata, var_names=["mu", "y_pred"], samples=10)
and get:
--------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [18], in <cell line: 1>()
1 with pm.Model() as model:
2 # Define mutable data
----> 3 x_data = pm.MutableData("x_data", x_data)
5 # Define prior for mu
6 mu = pmb.BART("mu", X= x_data, Y=y_data)
File ~\anaconda3\lib\site-packages\pymc\data.py:295, in MutableData(name, value, dims, coords, export_index_as_coords, **kwargs)
281 def MutableData(
282 name: str,
283 value,
(...)
288 **kwargs,
289 ) -> SharedVariable:
290 """Alias for ``pm.Data(..., mutable=True)``.
291
292 Registers the ``value`` as a :class:`~pytensor.compile.sharedvalue.SharedVariable`
293 with the model. For more information, please reference :class:`pymc.Data`.
294 """
--> 295 var = Data(
296 name,
297 value,
298 dims=dims,
299 coords=coords,
300 export_index_as_coords=export_index_as_coords,
301 mutable=True,
302 **kwargs,
303 )
304 return cast(SharedVariable, var)
File ~\anaconda3\lib\site-packages\pymc\data.py:415, in Data(name, value, dims, coords, export_index_as_coords, mutable, **kwargs)
413 mutable = False
414 if mutable:
--> 415 x = pytensor.shared(arr, name, **kwargs)
416 else:
417 x = at.as_tensor_variable(arr, name, **kwargs)
File ~\anaconda3\lib\site-packages\pytensor\compile\sharedvalue.py:199, in shared(value, name, strict, allow_downcast, **kwargs)
171 r"""Create a `SharedVariable` initialized with a copy or reference of `value`.
172
173 This function iterates over constructor functions to find a
(...)
195
196 """
198 if isinstance(value, Variable):
--> 199 raise TypeError("Shared variable values can not be symbolic.")
201 try:
202 var = shared_constructor(
203 value,
204 name=name,
(...)
207 **kwargs,
208 )
TypeError: Shared variable values can not be symbolic.
What is the datatype of the x_data
you are passing into pm.MutableData
?
In the coal_mine example from BART it gets defined to discretize the data:
# discretize data
years = int(X.max() - X.min())
bins = years // 4
hist, x_edges = np.histogram(Y, bins=bins)
# compute the location of the centers of the discretized data
x_centers = x_edges[:-1] + (x_edges[1] - x_edges[0]) / 2
# xdata needs to be 2D for BART
x_data = x_centers[:, None]
# express data as the rate number of disaster per year
y_data = hist / 4
So I keep the style
The error says “Shared variable values can not be symbolic”. “Symbolic” means it’s a pytensor tensor, plus the error is a “TypeError”, so you should check type(x_data)
and ensure that it is what you expect. In the code you posted, it is a numpy array, but I think you ran another model or something that overrode the name x_data
.
You are very right with all you say. It is a NumPy array - but no idea where it gets changed. I will have to check this out in more deep.
I saw in your LinkedIn profile that you did a course in Reinforcement Learning and speak Chinese as well, Amazing!
Would you mind if I write you an email tomorrow? - I have a question regarding the MCTS for finance, I also try to build and would like to ask you about your opinion.
Yeah of course, always happy to help
Happy Easter holiday! Frohe Ostern.
Any thoughts are appreciated.
I wonder if you find the solution for the error message" Size length is incompatible with batched dimensions of parameter 0 alpha: len(size) = 1, len(batched dims alpha) = 2. Size length must be 0 or >= 2"?
I have similar error message and my code is listed below:
labels = [“Intercept”, “Effective water level above ground [m]”]
x = data[labels].to_numpy()
loss = data[‘Building damage ratio’].to_numpy()
var = data[‘Effective water level above ground [m]’].var()
#train-test split
test_indices = range(len(x) - 10, len(x))
train_indices = range(len(x) - 10)
x_train, x_test = x[train_indices], x[test_indices]
rloss_train, rloss_test = rloss[train_indices], rloss[test_indices]
#Define and fit the model
coords = {“coeffs”: labels}
with pm.Model(coords=coords) as model:
# Data containers
x = pm.MutableData(“x”, x_train)
rloss = pm.MutableData(“rloss”, rloss_train)
# Priors
a = pm.Normal("a", mu=0, sigma=1, dims="coeffs")
b = pm.Normal("b", mu=0, sigma=1, dims="coeffs")
# Link function
mu = pm.Deterministic("mu", pm.math.invlogit(a*np.sqrt(x)+b))
phi = pm.Deterministic("phi", pm.math.exp(mu*(1-mu)/var-1))
alpha = pm.Deterministic("alpha", mu * phi)
beta = pm.Deterministic("beta", (1 - mu) * phi)
def logp_beta(obsLoss, alpha, beta):
return pm.logp(pm.Beta.dist(alpha=alpha, beta=beta), obsLoss)
def random(alpha, beta, rng= None, size=None) :
return scipy.stats.beta(alpha, beta).rvs(size)
rloss_custom = pm.CustomDist('rloss_custom', alpha, beta, logp=logp_beta, random=random, observed=rloss, shape=x.shape[0])