Modeling discrete non-theano observed data

I understand how to model numerical observed data but what is the correct way to model arbitrary observed categorical data (e.g., a list of python objects)?
I can come up with the logp function but using as_op with value was unsuccessful.

Discrete data can be enumerated, which means you can just use a number to represent each value. You just need to keep the mapping between objects and numbers around. If the discrete values are strings and if I have them in a pandas dataframe already, I usually use the ‘category’ dtype. (df['column'].astype('category')) You can then use and to map between integers and objects.

Is this the correct way?
This involves a lot of overhead converting complex objects (like a graph) into a unique string/number back and forth?
If I can produce the logp implementation why must these observed values be enumerated?

I don’t think theano supports arrays with an object dtype properly (in theory you can define your own subclass of tt.Type, but I don’t think you can expect that to work nicely).
I’m not sure why that would be that much overhead, you only need to have a dict or so mapping your graphs to numbers. That dict doesn’t even need to be constant, you can add stuff during sampling. But in general, if you have that many complicated discrete states then I’m not sure pymc is necessarily the right tool, most of the stuff is focused on continuous models.

Yes. That’s what I was worried about (pymc3 not being the right tool).
Dict will not work since the graph space is infinite.
Anything else I can do with “as_op”?