[How to Improve] Complicated Topic Model with too much RVs failed to Sample

Hi, I am new to Pymc3 and I am working on sentiment analysis with a topic model that has more nested latent variables and more hierarchies than LDA. I want to use Gibbs Sampling methods in Pymc3 to inference the latent word distribution for both sentiment words and non-sentiment words.

In my model, each document consists of a sequence of tokens (words). Each token has a latent sentiment and a latent category label. The category label indicate whether the word is a sentiment word or a background word. The exact words are drawn from different distribution according to their latent category labels.

I have two general question. The first one is I got this error when running the code (see below).

Exception: ('Compilation failed (return status=1): /Users/wang/.theano/compiledir_Darwin-18.6.0-x86_64-i386-64bit-i386-3.7.3-64/tmpb20_3emw/mod.cpp:37020:32: fatal error: bracket nesting level exceeded maximum of 256.         if (!PyErr_Occurred()) {.                                ^. /Users/wang/.theano/compiledir_Darwin-18.6.0-x86_64-i386-64bit-i386-3.7.3-64/tmpb20_3emw/mod.cpp:37020:32: note: use -fbracket-depth=N to increase maximum nesting level. 1 error generated.. '

Codes:

### Model specification

logging.info("Starting model")
with pm.Model() as model:
    
    pi = pm.Dirichlet(f"pi", a=gamma, shape=(C, C))
    phi_s = pm.Dirichlet(f"phi_s", a=beta_s, shape=(S, V))
    phi_c = pm.Dirichlet(f"phi_c", a=delta, shape=(C-1, V))
    logging.info("processed priors")
    
    theta_s = pm.Dirichlet(f"theta_s", a=alpha_s, shape=(D, S))  # Sentiment distribution
    logging.info("processed review level distribution")
    
    ## For each review
    for d, review in enumerate(X_id):
        
        # Batch logging every 10 documents
        if d % 10 == 0:
            logging.info("processing review {}".format(d))
        
        ## For each reivew
        S = pm.Categorical(f"s_{d}", p=theta_s[d], shape=len(review))

        ## For each word in review
        c_pre = None  # category of the previous word (in this review)
        for w, word in enumerate(review):
            s = S[w]
            # the first word has no transition
            if not c_pre:
                c = pm.Categorical(f"c_{d}_{w}", p=np.ones(C))
            else:
                c = pm.Categorical(f"c_{d}_{w}", p=pi[c_pre])

            # sentiment words
            if c == 1:
                ww = pm.Categorical(f"ww_{d}_{w}", p=phi_s[s], observed=word)
            # background words
            else:
                ww = pm.Categorical(f"ww_{d}_{w}", p=phi_c[c], observed=word)

            c_pre = c

logging.info("sampling begins")
with model:
    trace = pm.sample(draws=10, tune=1, chains=1, nuts_kwargs={'target_accept': 0.9})
logging.info("sampling completes")

The second one is I don’t know how can I improve the model specification within the with pm.Model() as model: block. I know I abusively used for loops within that but I don’t know how to improve it even after reading this tutorial. I tried something to put pm.Category() and pm.Dirichlet() outside of loop while keeping the same algorithm. It did make the model compiling faster, but still have the same errors. I don’t know what else can I do in terms of the tokens generation because it is conditioned on the category labels.

Attached parameters definition:

### Global parameters

# number of documents
D = len(X_id)
logging.info("Setting D documents: {0}".format(D))

# number of unique words
V = len(nlp.vocab)
logging.info("Setting V Vocab: {0}".format(V))

# number of sentiment
S = 2
logging.info("Setting S sentiments: {0}".format(S))

# number of words category
C = 2
logging.info("Setting C word categories: {0}".format(C))


### Hyperparameters

alpha_s = 50 * np.ones(S) / S
beta_s = 0.1 * np.ones(V)  # distribution of sentiment words
delta = 0.1 * np.ones(V)   # distribution of background words
gamma = 0.1 * np.ones(C)   # transition matrix for word categories

These kind of model you usually need to rewrite them into a marginalized mixture model to be able to do effective influence. You can take a look at the following related topics:

Thank you very much for your help. I would play around more with the code of these similar topics.