Inspired by reading BDA3, I wrote something on Dirichlet-Multinomials and PyMC3 someone might find useful:
Really interesting and well-written, thank you @clausherther!
Just a thought : when writing the 1st polling_model, technically speaking, I guess you would have to precise that the Dirichlet prior is of shape (3,), as you do in the second one? But you don’t do it because PyMC can infer it from the vector a, right?
Thanks again for you post, I’ve been reading too on DM models, and there aren’t lots of quality resources on them
You’re right, I was a bit lazy in leaving out the shape
parameter in the first example. Thanks for reading!
Hi @clausherther! Piggybacking on our discussion above, I wrote and implemented a model to forecast European elections in France: https://www.pollsposition.com/indicateurs/forecast_europeennes2019
All in and PyMC3 of course Thought that would interest you! And ping me back if you have any feedback!
Hi @clausherther and @AlexAndorra ! I was trying to go through your code for Polling #2 (how much did Bush’s approval shift after the debate), and I am running into an issue. When I run:
with pm.Model() as polling_model_debates:
# initializes the Dirichlet distribution with a uniform prior:
shape = (n_debates, n_candidates)
a = np.ones(shape)
# This creates a separate Dirichlet distribution for each debate
# where sum of probabilities across candidates = 100% for each debate
theta = pm.Dirichlet("theta", a=a, shape=shape)
# get the "Bush" theta for each debate, at index=0
bush_pref = pm.Deterministic("bush_pref", theta[:, 0] * n / m)
# to calculate probability that support for Bush shifted from debate 1 [0] to 2 [1]
bush_shift = pm.Deterministic("bush_shift", bush_pref[1]-bush_pref[0])
# because of the shapes of the inputs, this essentially creates 2 multinomials,
# one for each debate
responses = pm.Multinomial("responses", n=n, p=theta, observed=y)
I get the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_1809/2836093747.py in <module>
10
11 # get the "Bush" theta for each debate, at index=0
---> 12 bush_pref = pm.Deterministic("bush_pref", theta[:, 0] * n / m)
13
14 # to calculate probability that support for Bush shifted from debate 1 [0] to 2 [1]
/usr/local/lib/python3.8/site-packages/theano/tensor/var.py in __truediv__(self, other)
168
169 def __truediv__(self, other):
--> 170 return theano.tensor.basic.true_div(self, other)
171
172 def __floordiv__(self, other):
/usr/local/lib/python3.8/site-packages/theano/graph/op.py in __call__(self, *inputs, **kwargs)
251
252 if config.compute_test_value != "off":
--> 253 compute_test_value(node)
254
255 if self.default_output is not None:
/usr/local/lib/python3.8/site-packages/theano/graph/op.py in compute_test_value(node)
128 thunk.outputs = [storage_map[v] for v in node.outputs]
129
--> 130 required = thunk()
131 assert not required # We provided all inputs
132
/usr/local/lib/python3.8/site-packages/theano/graph/op.py in rval()
604
605 def rval():
--> 606 thunk()
607 for o in node.outputs:
608 compute_map[o][0] = True
/usr/local/lib/python3.8/site-packages/theano/link/c/basic.py in __call__(self)
1769 print(self.error_storage, file=sys.stderr)
1770 raise
-> 1771 raise exc_value.with_traceback(exc_trace)
1772
1773
ValueError: Input dimension mis-match. (input[0].shape[0] = 2, input[1].shape[0] = 5)
```
I commented out the two `Deterministic` sections and get the same error, though it is now pointing at the `Multinomial` line.
I've verified that `y` and `a` have the same dimensions, so it looks like an issue with `n`? But I don't know if that's right or how to fix it.
Thank you in advance!
@luerkene let me take a look at get back to you by end of the week! What version of PyMC are you running? I see you’re on Python 3.8.
Thank you so much! I’m currently running version 3.11.2
@luerkene looks like in the newer versions Multinomial
is more strict about input shapes. The n
parameter is lazily defined as (2,)
earlier.
If you resize it via n.resize(2,1)
before you compile the model, it should work. I’ll try and update the post and notebook to the latest PyMC and Arviz versions soon.
Thanks for pointing this out!
Thank you so much, @clausherther! I made the change but now get a different error:
Any tips on this?
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_1809/2836093747.py in <module>
10
11 # get the "Bush" theta for each debate, at index=0
---> 12 bush_pref = pm.Deterministic("bush_pref", theta[:, 0] * n / m)
13
14 # to calculate probability that support for Bush shifted from debate 1 [0] to 2 [1]
TypeError: unsupported operand type(s) for *: 'TensorVariable' and 'NoneType'
I’ve again commented out the Deterministic RV’s and still get an error (here I just resized n
on the spot…)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/usr/local/lib/python3.8/site-packages/theano/tensor/type.py in dtype_specs(self)
264 try:
--> 265 return {
266 "float16": (float, "npy_float16", "NPY_FLOAT16"),
KeyError: 'object'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
/tmp/ipykernel_1809/3312608857.py in <module>
17 # because of the shapes of the inputs, this essentially creates 2 multinomials,
18 # one for each debate
---> 19 responses = pm.Multinomial("responses", n=n.resize(2,1), p=theta, observed=y)
/usr/local/lib/python3.8/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
119 dist = cls.dist(*args, **kwargs, shape=shape)
120 else:
--> 121 dist = cls.dist(*args, **kwargs)
122 return model.Var(name, dist, data, total_size, dims=dims)
123
/usr/local/lib/python3.8/site-packages/pymc3/distributions/distribution.py in dist(cls, *args, **kwargs)
128 def dist(cls, *args, **kwargs):
129 dist = object.__new__(cls)
--> 130 dist.__init__(*args, **kwargs)
131 return dist
132
/usr/local/lib/python3.8/site-packages/pymc3/distributions/multivariate.py in __init__(self, n, p, *args, **kwargs)
574 else:
575 # n is a scalar, p is a 1d array
--> 576 self.n = tt.as_tensor_variable(n)
577 self.p = tt.as_tensor_variable(p)
578
/usr/local/lib/python3.8/site-packages/theano/tensor/basic.py in as_tensor_variable(x, name, ndim)
205 )
206
--> 207 return constant(x, name=name, ndim=ndim)
208
209
/usr/local/lib/python3.8/site-packages/theano/tensor/basic.py in constant(x, name, ndim, dtype)
253 assert x_.ndim == ndim
254
--> 255 ttype = TensorType(dtype=x_.dtype, broadcastable=[s == 1 for s in x_.shape])
256
257 try:
/usr/local/lib/python3.8/site-packages/theano/tensor/type.py in __init__(self, dtype, broadcastable, name, sparse_grad)
52 # True or False
53 self.broadcastable = tuple(bool(b) for b in broadcastable)
---> 54 self.dtype_specs() # error checking is done there
55 self.name = name
56 self.numpy_dtype = np.dtype(self.dtype)
/usr/local/lib/python3.8/site-packages/theano/tensor/type.py in dtype_specs(self)
280 }[self.dtype]
281 except KeyError:
--> 282 raise TypeError(
283 f"Unsupported dtype for {self.__class__.__name__}: {self.dtype}"
284 )
TypeError: Unsupported dtype for TensorType: object
Hey @clausherther , I figured it out. For whatever reason I had to call n = np.reshape(n, (-1,1))
and everything worked. Thank you again for your help and for the post! I really enjoyed it and looking forward to reading anything else you publish.