Example Code Taking Very Long time to run?

Hello,

I am running the code for the following example:
https://docs.pymc.io/notebooks/dp_mix.html
The first step of “Initializing NUTS using advi…” is taking a very long time to run. It has been a couple hours and the estimated time is 36 hours. Is this normal?

Everything in PyMC3 seems to run in exponential time because as I double the data size, things seem to take way longer. Is pip install all that need be done to run PyMC3?

Code I am running, which is copied from the example’s page.

old_faithful_df
old_faithful_df['std_waiting'] = (old_faithful_df.waiting - old_faithful_df.waiting.mean()) / old_faithful_df.waiting.std()
N = old_faithful_df.shape[0]
K = 30


def stick_breaking(beta):
    portion_remaining = tt.concatenate([[1], tt.extra_ops.cumprod(1 - beta)[:-1]])
    return beta * portion_remaining


with pm.Model() as model:
    alpha = pm.Gamma('alpha', 1., 1.)
    beta = pm.Beta('beta', 1., alpha, shape=K)
    w = pm.Deterministic('w', stick_breaking(beta))

    tau = pm.Gamma('tau', 1., 1., shape=K)
    lambda_ = pm.Gamma('lambda_', 10., 1., shape=K)
    mu = pm.Normal('mu', 0, tau=lambda_ * tau, shape=K)
    obs = pm.NormalMixture('obs', w, mu, tau=lambda_ * tau,
                           observed=old_faithful_df.std_waiting.values)
    
SEED = 5132290 # from random.org
np.random.seed(SEED)
with model:
    trace = pm.sample(
        1000,
        tune=2500,
        chains=2,
        init='advi',
        target_accept=0.9,
        random_seed=SEED
    )

Ah, good thing I looked at the Jupyter Notebook Terminal:

WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.

Looks like I needed to install anoconda to install m2w64-toolchain?

The first process seems to complete very very quickly… now my notebook dies once it starts actually sampling…

A connection to the notebook server could not be established. The notebook will continue trying to reconnect. Check your network connection or notebook server configuration.

At this point, the note book fails…

WARNING (theano.gof.compilelock): Overriding existing lock by dead process '24860' (I am process '14912')
Auto-assigning NUTS sampler...
Initializing NUTS using advi...
Average Loss = 440.04:   5%|▍         | 9399/200000 [00:19<06:35, 481.32it/s]
Convergence achieved at 9400
Interrupted at 9,399 [4%]: Average Loss = 498.1
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [mu, lambda_, tau, beta, alpha]
Sampling 2 chains, 0 divergences:   0%|          | 0/7000 [00:00<?, ?draws/s]

This post is helpful:

Now some convergence checks do not work…and I do not think this is a solution. How can I use more than one core?

The install was with PyMC3 3.8…
I have been trying to install this on anaconda for hours.

Is there a more in depth guide to install?

I found, for whatever reason, that miniconda was much more flexible. First, follow Theano directions on their website. Then the pymc3 install instructions on this website.