PYMC3 within an Azure Function - Timing out (30mins+) at pm.sample() stage

Thalantyr · August 17, 2020, 3:07pm

Hi there,
I have been using PYMC for a while and have never had any had any problems setting up and managing environments for it until I have recently wanted to deploy it within an Azure Function.

I have replicated the environment locally and the code works fine (and has done for ages under similar environments too); however, I am at a loss in how to get it to work on Azure.

The section of code is based upon https://docs.pymc.io/notebooks/rugby_analytics.html and is as follows:

  model = pm.Model()
  with pm.Model() as model:
    home = pm.Flat('home')
    tau_att = pm.Gamma('tau_att', .1, .1)
    tau_def = pm.Gamma('tau_def', .1, .1)
    intercept = pm.Flat('intercept')

    #team-specific parameters
    atts_star = pm.Normal("atts_star", 
                            mu=0.0, 
                            tau=tau_att, 
                            shape=num_teams)
    defs_star = pm.Normal("defs_star", 
                            mu=0.0, 
                            tau=tau_def, 
                            shape=num_teams)




    atts = pm.Deterministic('atts', atts_star - tt.mean(atts_star))
    defs = pm.Deterministic('defs', defs_star - tt.mean(defs_star))
    home_theta = tt.exp(intercept +  home + atts[home_team] + defs[away_team])
    away_theta = tt.exp(intercept + atts[away_team] + defs[home_team])

    home_goals = pm.Poisson('home_goals', mu=home_theta, observed=observed_home_goals)
    away_goals = pm.Poisson('away_goals', mu=away_theta, observed=observed_away_goals)
  
  logging.info("Running PYMC Process")

  with model:
    # step = pm.Metropolis()
    # trace = pm.sample(2000,step=step, cores=1)
    trace = pm.sample(1000, tune=1000, cores=1)

When I run this locally it completes in around 11 seconds - (Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 11 seconds.).

When I run this within an Azure Function It basically gets to pm.sample() before timing out / never returning, ie:

Screenshot 2020-08-17 at 15.19.38

I have tried running with different step functions, num cores, draws, chains, different versions of requisites & PYMC3 versions and have had no luck.

I am currently using Python 3.7 on Ubuntu 16.04 x64 and have added a bash script to the build pipeline to add pre-reqs:

sudo apt install libatlas-base-dev
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git graphviz
sudo pip install Theano

sudo apt-get install g++-4.9

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 20
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 10

sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 20
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 10

sudo update-alternatives --install /usr/bin/cc cc /usr/bin/gcc 30
sudo update-alternatives --set cc /usr/bin/gcc

sudo update-alternatives --install /usr/bin/c++ c++ /usr/bin/g++ 30
sudo update-alternatives --set c++ /usr/bin/g++
pip install --target="./.python_packages/lib/site-packages" -r requirements.txt

requirements.txt:
azure-functions
scipy==1.3.1
snowflake-connector-python==2.2.2
snowflake-sqlalchemy==1.2.2
SQLAlchemy==1.3.7
pandas==1.0.1
sklearn==0.0
matplotlib==3.1.1
numpy==1.16.4
Theano==1.0.5
Cython==0.29.16
pymc3==3.9.3
statsmodels==0.10.1

I get no other import errors / warnings and am now completely out of ideas, any help will be greatly appreciated.

Thanks!

Thalantyr · August 17, 2020, 8:13pm

Update: Pip installing the above libraries in requirements.txt, and using the exact same code, works perfectly on a fresh Google Collab setup:

Auto-assigning NUTS sampler…
INFO:pymc3:Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
INFO:pymc3:Initializing NUTS using jitter+adapt_diag…
Multiprocess sampling (4 chains in 4 jobs)
INFO:pymc3:Multiprocess sampling (4 chains in 4 jobs)
NUTS: [defs_star, atts_star, intercept, tau_def, tau_att, home]
INFO:pymc3:NUTS: [defs_star, atts_star, intercept, tau_def, tau_att, home]
Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 32 seconds.
INFO:pymc3:Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 32 seconds.

Would be interested if anyone has ever got this working within an Azure Function as there must be something weird going on. Thanks.

aseyboldt · August 18, 2020, 8:25am

This sounds infuriating
I don’t have any idea what might be causing this, but if you want to debug further you could try playing with theano options a bit.

If you set theano.config.mode = "FAST_COMPILE" theano will not use a compiler at all. It should be much slower, but if this samples fine then we at least know that the problem is related to that.

You could also try to install pymc using conda-forge and see if that changes anything.

Thalantyr · August 18, 2020, 9:41am

Thanks for the reply!

I did think it was something to do with the compiler, I have tried several ways to install them using the bash script in the past and the above is just my latest attempt.

Using ‘FAST_COMPILE’ (I had read about this but didn’t try it for some reason) PYMC3 does actually sample fine now but pretty slow, which is expected:

Screenshot 2020-08-18 at 10.38.24

I’ll try via conda-forge and see if that changes anything and will also look into seeing if I can install / point to the compilers any better. At least this works now, thanks!

Topic		Replies	Views
Sample with multiple cores Questions	3	1143	September 10, 2020
When ever I tried running NUTS sampling my notebook gets disconnected from kernel Questions	4	1779	July 7, 2018
Sample method of pymc3 Questions	4	631	July 15, 2018
Timeout on pymc3.sampling.sample Questions	1	730	November 13, 2019
Latest PyMC3 from git fails to sample Questions	5	615	November 13, 2017

PYMC3 within an Azure Function - Timing out (30mins+) at pm.sample() stage

Related Topics