Is your CPU AMD? I think amdlibm only apply for that.
What I found surprising is that some of the operation is not compile to c, but I have not look at any profiling result for a long time so I might be miss-informed
Pinging @aseyboldt to see if he has some tips re looking at profiling and speeding up ops
Mmmh, not sure how to answer – sorry, I am truly clueless about this stuff
Here is my machine’s info (it’s a Mac), hoping that it answers your question:
compiler : Clang 4.0.1 (tags/RELEASE_401/final)
system : Darwin
release : 15.6.0
machine : x86_64
processor : i386
CPU cores : 2
interpreter: 64bit
Mac only use Intel CPU I think - and you set up your environment using conda? (if so Intel MKL should be set up correctly)
Yes it’s a conda env – don’t know if useful but my Mac is quite old (mid-2009).
FYI, I just ran the same model, but without modeling the covariation btw intercepts and slopes and it took 10 minutes to sample. So it seems that it really is the addition of the covariation structure (MvNormal) that considerably slows sampling.
Just realized I can mark this as solved thanks to @junpenglao’s suggestion of considering that non-available categories have probability 0 – this implies we know in advance which categories are absent. Of course, I’m curious about @aseyboldt thoughts on profiling and speed-ups if he’s got any!
Here is a simplified version of the model, for the sake of readability and understanding how to implement the idea with PyMC:
with pm.Model() as m_multi:
a_cluster = pm.Normal("a_cluster", -1.8, 0.5, shape=(Nclusters, Ncategories - 1))
a_pivot = tt.as_tensor_variable(np.full(shape=(Nclusters, 1), fill_value=-2.2))
a_cluster_f = tt.horizontal_stack(a_cluster a_pivot)
lat_p = tt.nnet.softmax(a_cluster_f[cluster_id])
# zero-inflation process:
# keep only preferences for available categories:
slot_p = cats_available * lat_p
# normalize preferences:
slot_p = slot_p / tt.sum(slot_p, axis=1, keepdims=True)
R = pm.Multinomial("R", n=N, p=slot_p, observed=sim_R)
trace_multi = pm.sample(2000, tune=3000, init="adapt_diag")
Hope this will help future users!
For the curious out there, the real model is much more complicated (varying-effects and covariance) but I’ll open source it – as well as real and simulated data – once I’ve had time to clean it up!
And again, a big thank you to Junpeng Couldn’t have done it without your help! Now I owe you a second beer when you come to Paris