Is this half cauchy model correct?

The last post in the linked thread shows how to use pm.MutableData to do out-of-sample predictions in a BART model, and I don’t see any of that in the code block you just posted here. Am I missing something?

Yes - that i often confuse myself to a degree were I am on the right track and after this getting lost again.
Now if I make it mutable and put the shape[0] to observed and leave out y after the beginning were it is requested it looks like this:

with pm.Model() as model:
   # Define mutable data
    x_data = pm.MutableData("x_data", x_data)
   
   # Define prior for mu
    mu = pmb.BART("mu", X= x_data, Y=y_data)
   
   # Define likelihood
    y_pred = pm.Poisson("y_pred", mu=mu, observed=X.shape[0])
   
   # Sampling
    idata = pm.sample(random_seed=RANDOM_SEED)
   
    # Generate new data
    new_X = generate_new_data(...)
   
    # Use the existing model to generate posterior predictive samples for the new data
with model:
    pm.set_data({"data_X": new_X})
    ppc = pm.sample_posterior_predictive(idata, var_names=["mu", "y_pred"], samples=10)

and get:

--------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [18], in <cell line: 1>()
      1 with pm.Model() as model:
      2    # Define mutable data
----> 3     x_data = pm.MutableData("x_data", x_data)
      5    # Define prior for mu
      6     mu = pmb.BART("mu", X= x_data, Y=y_data)

File ~\anaconda3\lib\site-packages\pymc\data.py:295, in MutableData(name, value, dims, coords, export_index_as_coords, **kwargs)
    281 def MutableData(
    282     name: str,
    283     value,
   (...)
    288     **kwargs,
    289 ) -> SharedVariable:
    290     """Alias for ``pm.Data(..., mutable=True)``.
    291 
    292     Registers the ``value`` as a :class:`~pytensor.compile.sharedvalue.SharedVariable`
    293     with the model. For more information, please reference :class:`pymc.Data`.
    294     """
--> 295     var = Data(
    296         name,
    297         value,
    298         dims=dims,
    299         coords=coords,
    300         export_index_as_coords=export_index_as_coords,
    301         mutable=True,
    302         **kwargs,
    303     )
    304     return cast(SharedVariable, var)

File ~\anaconda3\lib\site-packages\pymc\data.py:415, in Data(name, value, dims, coords, export_index_as_coords, mutable, **kwargs)
    413     mutable = False
    414 if mutable:
--> 415     x = pytensor.shared(arr, name, **kwargs)
    416 else:
    417     x = at.as_tensor_variable(arr, name, **kwargs)

File ~\anaconda3\lib\site-packages\pytensor\compile\sharedvalue.py:199, in shared(value, name, strict, allow_downcast, **kwargs)
    171 r"""Create a `SharedVariable` initialized with a copy or reference of `value`.
    172 
    173 This function iterates over constructor functions to find a
   (...)
    195 
    196 """
    198 if isinstance(value, Variable):
--> 199     raise TypeError("Shared variable values can not be symbolic.")
    201 try:
    202     var = shared_constructor(
    203         value,
    204         name=name,
   (...)
    207         **kwargs,
    208     )

TypeError: Shared variable values can not be symbolic.

What is the datatype of the x_data you are passing into pm.MutableData?

In the coal_mine example from BART it gets defined to discretize the data:

# discretize data
years = int(X.max() - X.min())
bins = years // 4
hist, x_edges = np.histogram(Y, bins=bins)
# compute the location of the centers of the discretized data
x_centers = x_edges[:-1] + (x_edges[1] - x_edges[0]) / 2
# xdata needs to be 2D for BART
x_data = x_centers[:, None]
# express data as the rate number of disaster per year
y_data = hist / 4

So I keep the style

The error says “Shared variable values can not be symbolic”. “Symbolic” means it’s a pytensor tensor, plus the error is a “TypeError”, so you should check type(x_data) and ensure that it is what you expect. In the code you posted, it is a numpy array, but I think you ran another model or something that overrode the name x_data.

You are very right with all you say. It is a NumPy array - but no idea where it gets changed. I will have to check this out in more deep.

I saw in your LinkedIn profile that you did a course in Reinforcement Learning and speak Chinese as well, :smiley: Amazing!

Would you mind if I write you an email tomorrow? - I have a question regarding the MCTS for finance, I also try to build and would like to ask you about your opinion.

Yeah of course, always happy to help

1 Like

Happy Easter holiday! Frohe Ostern.

Any thoughts are appreciated.

I wonder if you find the solution for the error message" Size length is incompatible with batched dimensions of parameter 0 alpha: len(size) = 1, len(batched dims alpha) = 2. Size length must be 0 or >= 2"?

I have similar error message and my code is listed below:

labels = [“Intercept”, “Effective water level above ground [m]”]
x = data[labels].to_numpy()
loss = data[‘Building damage ratio’].to_numpy()
var = data[‘Effective water level above ground [m]’].var()
#train-test split
test_indices = range(len(x) - 10, len(x))
train_indices = range(len(x) - 10)
x_train, x_test = x[train_indices], x[test_indices]
rloss_train, rloss_test = rloss[train_indices], rloss[test_indices]

#Define and fit the model
coords = {“coeffs”: labels}

with pm.Model(coords=coords) as model:
# Data containers
x = pm.MutableData(“x”, x_train)
rloss = pm.MutableData(“rloss”, rloss_train)

# Priors
a = pm.Normal("a", mu=0, sigma=1, dims="coeffs")
b = pm.Normal("b", mu=0, sigma=1, dims="coeffs")

# Link function
mu = pm.Deterministic("mu", pm.math.invlogit(a*np.sqrt(x)+b))
phi = pm.Deterministic("phi", pm.math.exp(mu*(1-mu)/var-1))

alpha = pm.Deterministic("alpha", mu * phi)
beta = pm.Deterministic("beta", (1 - mu) * phi)

def logp_beta(obsLoss, alpha, beta): 
    return pm.logp(pm.Beta.dist(alpha=alpha, beta=beta), obsLoss)


def random(alpha, beta, rng= None, size=None) :
    return scipy.stats.beta(alpha, beta).rvs(size)

rloss_custom = pm.CustomDist('rloss_custom', alpha, beta, logp=logp_beta, random=random, observed=rloss, shape=x.shape[0])