Initial evaluation failed when sampling for a probit stick-breaking process

LU_X · May 5, 2024, 8:30pm

Hi,
I am trying to sample from a probit stick-breaking process using Pymc (version = 5.10.4), but I keep getting an error message saying the initial evaluation failed at the starting point.

The following is my code. it is almost identical to the posthere

Blockquote

N = X_tr.shape[0] # 420
J = X_tr.shape[1] # 6
K = 10 # maxmimum number of cluster

def stick_breaking(v):
return v * pt.concatenate(
[pt.ones_like(v[:, :1]), pt.extra_ops.cumprod(1 - v, axis=1)[:, :-1]], axis=1
)

def norm_cdf(z):
return 0.5 * (1 + pt.erf(z / np.sqrt(2)))

with pm.Model(coords={“component”: np.arange(K)+1, “obs_id”: np.arange(N),“J”: np.arange(J)}) as model:
alpha = pm.Normal("alpha", 0.0, 5.0, dims="component")
beta = pm.Normal("beta", 0.0, 5.0, dims=("J", "component"))
x = pm.MutableData("x", X_tr.to_numpy())  # use mutabledata to allow for prediction

v = norm_cdf(alpha + pm.math.dot(x, beta))

w = pm.Deterministic("w", stick_breaking(v), dims=["obs_id", "component"]) 

sigma = pm.Gamma("sigma", 1.0, 1.0, dims="component") # transform = pm.distributions.transforms.univariate_ordered

coef = pm.Normal('coef', mu=0, sigma=2,dims=("J", "component"))
# input the data

y = pm.Data("y", y_tr.to_list())


mu = pm.Deterministic("mu",  pm.math.dot(x, coef))  ## mu = coef*x. here mu iss of dimenion 'componens'
obs = pm.NormalMixture(
    "obs", w, mu, sigma = sigma, observed= y,  ## here, for one y, the w is of dimension 'component', the mu is of dimension 'component'
    dims="obs_id"
)
with model:
# Posterior sampling
trace = pm.sample(1000, chains=1, return_inferencedata=False)

And I got the following error：

SamplingError Traceback (most recent call last)
in <cell line: 2>()
2 with model:
3 # Posterior sampling
----> 4 trace = pm.sample(1000, chains=1, return_inferencedata=False)
5 # Generate posterior predictive samples
6 #X_new = np.linspace(0, 15, 100)[:, None]

1 frames
/usr/local/lib/python3.10/dist-packages/pymc/model/core.py in check_start_vals(self, start)
1651
1652 if not all(np.isfinite(v) for v in initial_eval.values()):
→ 1653 raise SamplingError(
1654 “Initial evaluation of model at starting point failed!\n”
1655 f"Starting values:\n{elem}\n\n"

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{‘alpha’: array([-0.03270944, -0.59396931, 0.12858193, 0.14665458, -0.10482884,
0.44430566, -0.06075686, -0.0763148 , 0.71447235, -0.04947083]), ‘beta’: array([[ 0.71017244, -0.84745191, 0.09199422, -0.92178166, 0.35782199,
-0.31069918, -0.70137274, 0.58921903, 0.90338207, 0.01201175],
[ 0.81559558, 0.25362333, 0.62969751, 0.66624938, 0.24056323,
-0.68136207, -0.65085853, -0.21432691, 0.99613048, 0.07285295],
[ 0.98507979, -0.47198728, 0.43196516, -0.18781016, 0.13540294,
-0.49876927, 0.20280524, -0.72443258, 0.30696514, 0.00474679],
[-0.94975048, -0.20677565, -0.1611585 , -0.26073709, -0.40670049,
0.2342001 , 0.99548869, 0.07788006, 0.60348358, 0.08239775],
[ 0.18510105, -0.98194933, -0.14034538, 0.26389123, 0.35964213,
0.57693397, -0.22731952, -0.87325861, 0.63861162, 0.72851119],
[ 0.41317165, -0.10432875, 0.96741211, 0.10580337, -0.95195856,
-0.44381268, -0.12168857, 0.64722997, 0.34687489, -0.25755547]]), ‘sigma_log__’: array([-0.2698805 , 0.47953377, -0.70755573, -0.83080522, -0.14251744,
0.96693805, 0.8889986 , -0.23951238, 0.39014642, 0.17711656]), ‘coef’: array([[ 0.37569821, -0.68960897, 0.71219097, -0.90556031, -0.89636317,
0.06267078, -0.17411971, -0.51039049, 0.87399132, -0.87842608],
[ 0.64093635, 0.31218706, 0.12399261, 0.56654999, -0.21788175,
-0.0047365 , 0.74959882, 0.79940618, 0.38664556, -0.93478344],
[ 0.22728805, 0.26101639, -0.15065008, -0.03823388, 0.25933377,
0.55018635, 0.31458068, 0.76147923, 0.53930128, -0.78278249],
[ 0.40041729, -0.80436768, 0.79529463, 0.32022015, -0.6642901 ,
0.30358946, 0.61153408, -0.85714695, -0.22661507, -0.99668949],
[ 0.16795301, -0.96886254, 0.24453266, 0.76115628, 0.89750362,
0.09961963, 0.49612719, 0.53264433, -0.61970289, 0.22666013],
[ 0.98905904, 0.70210809, -0.59168376, -0.55795283, 0.53754648,
0.19716335, 0.97059653, 0.51926715, -0.37808405, -0.16489517]])}

Logp initial evaluation results:
{‘alpha’: -25.31, ‘beta’: -152.08, ‘sigma’: -11.98, ‘coef’: -99.38, ‘obs’: -inf}
You can call model.debug() for more details.

A further debug seems to imply that the weight in the mixture model might be problematic(?)

point={‘alpha’: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]), ‘beta’: array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]), ‘sigma_log__’: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]), ‘coef’: array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])}

The variable obs has the following parameters:
0: Mul [id A] <Matrix(float64, shape=(?, 10))> ‘w’
├─ [[0.5]] [id B] <Matrix(float64, shape=(1, 1))>
├─ Erfc [id C] <Matrix(float64, shape=(?, 10))>
│ └─ Mul [id D] <Matrix(float64, shape=(?, 10))>
│ ├─ [[-0.70710678]] [id E] <Matrix(float64, shape=(1, 1))>
│ └─ Add [id F] <Matrix(float64, shape=(?, 10))>
│ ├─ ExpandDims{axis=0} [id G] <Matrix(float64, shape=(1, 10))>
│ │ └─ alpha [id H] <Vector(float64, shape=(10,))>
│ └─ Dot22 [id I] <Matrix(float64, shape=(?, 10))>
│ ├─ x [id J] <Matrix(float64, shape=(?, ?))>
│ └─ beta [id K] <Matrix(float64, shape=(6, 10))>
└─ Join [id L] <Matrix(float64, shape=(?, 10))>
├─ 1 [id M] <Scalar(int8, shape=())>
├─ Alloc [id N] <Matrix(float64, shape=(?, 1))>
│ ├─ [[1.]] [id O] <Matrix(float64, shape=(1, 1))>
│ ├─ Shape_i{0} [id P] <Scalar(int64, shape=())>
│ │ └─ x [id J] <Matrix(float64, shape=(?, ?))>
│ └─ 1 [id Q] <Scalar(int64, shape=())>
└─ Subtensor{:, :stop} [id R] <Matrix(float64, shape=(?, 9))>
├─ CumOp{1, mul} [id S] <Matrix(float64, shape=(?, 10))>
│ └─ Sub [id T] <Matrix(float64, shape=(?, 10))>
│ ├─ [[1.]] [id O] <Matrix(float64, shape=(1, 1))>
│ └─ Mul [id U] <Matrix(float64, shape=(?, 10))>
│ ├─ [[0.5]] [id B] <Matrix(float64, shape=(1, 1))>
│ └─ Erfc [id C] <Matrix(float64, shape=(?, 10))>
│ └─ ···
└─ -1 [id V]
1: normal_rv{0, (0, 0), floatX, False}.1 [id W] <Matrix(float64, shape=(420, 10))>
├─ RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7E0463F61460>) [id X]
├─ [420 10] [id Y] <Vector(int64, shape=(2,))>
├─ 11 [id Z] <Scalar(int64, shape=())>
├─ Dot22 [id BA] <Matrix(float64, shape=(?, 10))> ‘mu’
│ ├─ x [id J] <Matrix(float64, shape=(?, ?))>
│ └─ coef [id BB] <Matrix(float64, shape=(6, 10))>
└─ Exp [id BC] <Vector(float64, shape=(10,))> ‘sigma’
└─ sigma_log__ [id BD] <Vector(float64, shape=(10,))>
The parameters evaluate to:
0: [[0.5 0.25 0.125 … 0.00390625 0.00195312 0.00097656]
[0.5 0.25 0.125 … 0.00390625 0.00195312 0.00097656]
[0.5 0.25 0.125 … 0.00390625 0.00195312 0.00097656]
…
[0.5 0.25 0.125 … 0.00390625 0.00195312 0.00097656]
[0.5 0.25 0.125 … 0.00390625 0.00195312 0.00097656]
[0.5 0.25 0.125 … 0.00390625 0.00195312 0.00097656]]
1: [[-1.95926723 -0.85367693 -0.56795929 … 1.73544166 0.12485065
-1.26401021]
[-2.00982609 0.4413363 0.47774306 … 1.64401346 -1.04242287
0.16993242]
[ 0.61234729 0.92983145 -1.21546406 … 0.22537962 -0.57043356
-0.47701534]
…
[-0.83186945 0.40375499 1.09441776 … -0.42199167 -0.23217083
0.21192489]
[-0.43426992 -1.06571056 0.47856101 … 1.06958692 0.51942876
0.09085738]
[-0.42334862 0.6841674 -0.094333 … -0.13844733 1.14350897
0.20857583]]
This does not respect one of the following constraints: 0 <= weights <= 1, sum(weights) == 1

0 <= weights <= 1, sum(weights) == 1
Apply node that caused the error: Check{0 <= weights <= 1, sum(weights) == 1}(Composite{(i1 + log(i0))}.0, All{axes=None}.0)
Toposort index: 30
Inputs types: [TensorType(float64, shape=(420,)), TensorType(bool, shape=())]
Inputs shapes: [(420,), ()]
Inputs strides: [(8,), ()]
Inputs values: [‘not shown’, array(False)]
Outputs clients: [[‘output’]]

Backtrace when the node is created (use PyTensor flag traceback__limit=N to make it longer):
File “/usr/local/lib/python3.10/dist-packages/pymc/model/core.py”, line 719, in logp
rv_logps = transformed_conditional_logp(
File “/usr/local/lib/python3.10/dist-packages/pymc/logprob/basic.py”, line 612, in transformed_conditional_logp
temp_logp_terms = conditional_logp(
File “/usr/local/lib/python3.10/dist-packages/pymc/logprob/basic.py”, line 542, in conditional_logp
q_logprob_vars = _logprob(
File “/usr/lib/python3.10/functools.py”, line 889, in wrapper
return dispatch(args[0].class)(*args, **kw)
File “/usr/local/lib/python3.10/dist-packages/pymc/distributions/mixture.py”, line 358, in marginal_mixture_logprob
mix_logp = check_parameters(
File “/usr/local/lib/python3.10/dist-packages/pymc/distributions/dist_math.py”, line 73, in check_parameters
return CheckParameterValue(msg, can_be_replaced_by_ninf)(expr, all_true_scalar)
File “/usr/local/lib/python3.10/dist-packages/pytensor/graph/op.py”, line 295, in call
node = self.make_node(*inputs, **kwargs)
File “/usr/local/lib/python3.10/dist-packages/pytensor/raise_op.py”, line 91, in make_node
[value.type()],

HINT: Use the PyTensor flag exception_verbosity=high for a debug print-out and storage map footprint of this Apply node.

Is there anyway if I can ensure the weights add up to 1? Or is there any other problems with this code? Any advice would be greatly appreciated, Thank you

Best,
LU

Topic		Replies	Views
Using stick breaking algorithm example but receiving gradient problem Questions	6	2650	February 27, 2018
Weibull mixture model with the Dirichlet process priori Questions	1	516	March 17, 2021
Issue with .sample() method used on a dirichlet mixture model with normal prior version agnostic	0	302	April 13, 2023
Modelling a Mixture Model modeling	1	108	June 19, 2024
Initial evaluation of model at starting point failed when running a mixture model Questions bug	2	1218	October 8, 2021

Initial evaluation failed when sampling for a probit stick-breaking process

Related topics