PYMC3 within an Azure Function - Timing out (30mins+) at pm.sample() stage

Hi there,
I have been using PYMC for a while and have never had any had any problems setting up and managing environments for it until I have recently wanted to deploy it within an Azure Function.

I have replicated the environment locally and the code works fine (and has done for ages under similar environments too); however, I am at a loss in how to get it to work on Azure.

The section of code is based upon https://docs.pymc.io/notebooks/rugby_analytics.html and is as follows:

  model = pm.Model()
  with pm.Model() as model:
    home = pm.Flat('home')
    tau_att = pm.Gamma('tau_att', .1, .1)
    tau_def = pm.Gamma('tau_def', .1, .1)
    intercept = pm.Flat('intercept')

    #team-specific parameters
    atts_star = pm.Normal("atts_star", 
                            mu=0.0, 
                            tau=tau_att, 
                            shape=num_teams)
    defs_star = pm.Normal("defs_star", 
                            mu=0.0, 
                            tau=tau_def, 
                            shape=num_teams)




    atts = pm.Deterministic('atts', atts_star - tt.mean(atts_star))
    defs = pm.Deterministic('defs', defs_star - tt.mean(defs_star))
    home_theta = tt.exp(intercept +  home + atts[home_team] + defs[away_team])
    away_theta = tt.exp(intercept + atts[away_team] + defs[home_team])

    home_goals = pm.Poisson('home_goals', mu=home_theta, observed=observed_home_goals)
    away_goals = pm.Poisson('away_goals', mu=away_theta, observed=observed_away_goals)
  
  logging.info("Running PYMC Process")

  with model:
    # step = pm.Metropolis()
    # trace = pm.sample(2000,step=step, cores=1)
    trace = pm.sample(1000, tune=1000, cores=1)

When I run this locally it completes in around 11 seconds - (Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 11 seconds.).

When I run this within an Azure Function It basically gets to pm.sample() before timing out / never returning, ie:

Screenshot 2020-08-17 at 15.19.38

I have tried running with different step functions, num cores, draws, chains, different versions of requisites & PYMC3 versions and have had no luck.

I am currently using Python 3.7 on Ubuntu 16.04 x64 and have added a bash script to the build pipeline to add pre-reqs:

sudo apt install libatlas-base-dev
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git graphviz
sudo pip install Theano

sudo apt-get install g++-4.9

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 20
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 10

sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 20
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 10

sudo update-alternatives --install /usr/bin/cc cc /usr/bin/gcc 30
sudo update-alternatives --set cc /usr/bin/gcc

sudo update-alternatives --install /usr/bin/c++ c++ /usr/bin/g++ 30
sudo update-alternatives --set c++ /usr/bin/g++
pip install --target="./.python_packages/lib/site-packages" -r requirements.txt

requirements.txt:
azure-functions
scipy==1.3.1
snowflake-connector-python==2.2.2
snowflake-sqlalchemy==1.2.2
SQLAlchemy==1.3.7
pandas==1.0.1
sklearn==0.0
matplotlib==3.1.1
numpy==1.16.4
Theano==1.0.5
Cython==0.29.16
pymc3==3.9.3
statsmodels==0.10.1

I get no other import errors / warnings and am now completely out of ideas, any help will be greatly appreciated.

Thanks!

Update: Pip installing the above libraries in requirements.txt, and using the exact same code, works perfectly on a fresh Google Collab setup:

Auto-assigning NUTS sampler…
INFO:pymc3:Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
INFO:pymc3:Initializing NUTS using jitter+adapt_diag…
Multiprocess sampling (4 chains in 4 jobs)
INFO:pymc3:Multiprocess sampling (4 chains in 4 jobs)
NUTS: [defs_star, atts_star, intercept, tau_def, tau_att, home]
INFO:pymc3:NUTS: [defs_star, atts_star, intercept, tau_def, tau_att, home]
Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 32 seconds.
INFO:pymc3:Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 32 seconds.

Would be interested if anyone has ever got this working within an Azure Function as there must be something weird going on. Thanks.

This sounds infuriating :slight_smile:
I don’t have any idea what might be causing this, but if you want to debug further you could try playing with theano options a bit.

If you set theano.config.mode = "FAST_COMPILE" theano will not use a compiler at all. It should be much slower, but if this samples fine then we at least know that the problem is related to that.

You could also try to install pymc using conda-forge and see if that changes anything.

1 Like

Thanks for the reply!

I did think it was something to do with the compiler, I have tried several ways to install them using the bash script in the past and the above is just my latest attempt.

Using ‘FAST_COMPILE’ (I had read about this but didn’t try it for some reason) PYMC3 does actually sample fine now but pretty slow, which is expected:

Screenshot 2020-08-18 at 10.38.24

I’ll try via conda-forge and see if that changes anything and will also look into seeing if I can install / point to the compilers any better. At least this works now, thanks!