In the following model, the observed variable AD is 2-dimensional and Poisson distributed. Each row of AD corresponds to 1 “position” and is associated with 1 AF. The number of positions (nPos) is fixed. Each position is associated with multiple “samples” which correspond to the columns (nSample) and the number of samples observed varies among positions. The ‘missing’ samples are np.nan in the input array. So in a way, the samples are pooled at the ‘position’ level and the positions are pooled at the AF level so that AF for positions are sampled from a single Gamma distribution. TD is an observed covariate that has the same dimensioan as AD and np.nan in exactly the same coordinates in the matrix.
Now, with missing data filled as np.nan, inference gave an error ‘SamplingError: Initial evaluation of model at starting point failed!’ with -inf likelihood for AD, which I assume to be caused by the np.nan. My question is, without imputation, how do I rewrite the model to make it work?
with pm.Model() as m:
TD_observed = pm.MutableData('TD_observed', np.array(dataDict['TD_observed'])) # (nPos, nSample)
AD_observed = pm.MutableData('AD_observed', np.array(dataDict['AD_observed'])) # (nPos, nSample)
gammaMu = pm.LogNormal("gammaMu", mu=1, sigma=1) # (1,)
gammaStd = pm.LogNormal("gammaStd", mu=1, sigma=1) # (1,)
gammaShape = pm.Deterministic('gammaShape', gammaMu ** 2 / gammaStd ** 2) # (1,)
gammaBeta = pm.Deterministic('gammaBeta', (gammaShape / gammaMu)) # (1,)
AF = pm.Gamma('AF', alpha=gammaShape, beta=gammaBeta, shape=(dataDict['nPos'], 1)) # (nPos, 1)
AF_coeff = .0001
lambda_p = pm.Deterministic('lambda_p', TD_observed * AF * AF_coeff) # (nPos, nSample) * (nPos, 1) > (nPos, nSample)
AD = pm.Poisson('AD', mu=lambda_p, observed=AD_observed) # (nPos, nSample)
Any help is appreciated.