Modeling discrete non-theano observed data

zahybnaya · September 27, 2017, 5:37pm

Hi,
I understand how to model numerical observed data but what is the correct way to model arbitrary observed categorical data (e.g., a list of python objects)?
I can come up with the logp function but using as_op with value was unsuccessful.

aseyboldt · September 27, 2017, 5:54pm

Discrete data can be enumerated, which means you can just use a number to represent each value. You just need to keep the mapping between objects and numbers around. If the discrete values are strings and if I have them in a pandas dataframe already, I usually use the ‘category’ dtype. (df['column'].astype('category')) You can then use vals.cat.codes and vals.cat.categories to map between integers and objects.

zahybnaya · September 27, 2017, 6:53pm

Is this the correct way?
This involves a lot of overhead converting complex objects (like a graph) into a unique string/number back and forth?
If I can produce the logp implementation why must these observed values be enumerated?
Thanks

aseyboldt · September 27, 2017, 7:08pm

I don’t think theano supports arrays with an object dtype properly (in theory you can define your own subclass of tt.Type, but I don’t think you can expect that to work nicely).
I’m not sure why that would be that much overhead, you only need to have a dict or so mapping your graphs to numbers. That dict doesn’t even need to be constant, you can add stuff during sampling. But in general, if you have that many complicated discrete states then I’m not sure pymc is necessarily the right tool, most of the stuff is focused on continuous models.

zahybnaya · September 27, 2017, 9:40pm

Yes. That’s what I was worried about (pymc3 not being the right tool).
Dict will not work since the graph space is infinite.
Anything else I can do with “as_op”?

Topic		Replies	Views
Categorical variable with p equal to deterministic that depends on observed categorical Questions	2	758	February 5, 2018
Modeling a DAG with Discrete and Categorical Variables modeling	0	56	March 8, 2025
Boolean operations support for theano.tensor Questions theano	1	2418	July 13, 2017
Dealing with 1 missing observation Questions	4	603	January 15, 2019
Defining a numeric (custom) likelihood function in PyMC3 Questions theano	2	3055	August 25, 2018

Modeling discrete non-theano observed data

Related topics