Geometric variable initialization


#1

Hi,

I saw that geometric variables by default get initialized to 1 (via self.mode = 1).
Any particular reason for that as opposed to the mean (1/p)?

The initialization at 1 creates some problems with the metropolis sampler (especially if the shape N of the geometric is >> 1): The Metropolis proposal is by default a “discretized” Gaussian, i.e. there’s a good chance that your proposal will take you from [1,1,1,…] to [1,0,1,…], whose loglikelihood is -inf.
Hence the chain has a real hard time leaving the initial state (if N is large, its very likely that one dimension contains the 0 and hence the whole proposal will be discarded).

Whats the general strategy for sampling discrete RVs in PyMC anyway? I’ve seen a few examples of binary RVs, but nothing for general discrete RVs


#2

If I remember correctly, all discrete RVs default value is at the mode, as the mean could be a continuous value thus outside of the support of the RV.

The inefficiency of random walk metropolis in high dimension is quite well noticed. You can do testval = 2 to set a different start value for the RV, or pass a dict of start value to pm.sample()

PyMC3 assign different samplers to discrete RVs, but besides a few special cases (Bernoulli, Categorical) they are sampled with random walk Metropolis.