Pymc on AWS Sagemaker Studio

Hey, I am trying to get pymc to work on AWS Sagemaker Studio and I’m finding it very difficult.

I found this old post, which explains some of the reasons why custom packages (particularly really heavy ones like pymc) are very difficult to use on Sagemaker. Reading this, I’d basically conclude I can’t use Sagemaker and need to go somewhere else: Installing PyMC v4 on Sagemaker

That said, post is over a year old. Anyone have any experience using pymc on Sagemaker lately?

Which image are you using? I can pip install pymc>=5 (and other tricky packages you might want like jax and numpyro) when using the Data Science 3.0 image. I think the problems are mostly related to the older (and possibly default?) images.

I had problems with both Data Science 1.0 and 3.0, but using conda install.

Then, someone else recommended I use Base Python 3.0, and do pip install.
That worked, but I get the “WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions” warning.

Do you get that warning?

Also, do you know why everyone is doing pip install? I thought conda install was “better.”

I have not gotten that warning when pip installing in Data Science 3.0, likely because it has MKL etc already baked in?

I personally use pip because I have to consume all packages via AWS CodeArtifact to ensure license and security compliance.

I experimented and tried four different ways to install.

Doing “%pip install pymc>=5” on a fresh Data Science 3.0 image:

  • Installs.
  • On import, gives g++ not detected warning.
  • On import, gives NumPy C-API warning.
  • Fails import with SystemError: initialization of _internal failed without raising an exception.

I think the g++ one is a much bigger deal, than the Numpy C-API one, right?
Anyway, it won’t even import here for me, so /shrug !

Doing “%conda install pymc>=5” on a fresh Data Science 3.0 image:

  • Fails install with error: ResolvePackageNotFound: - conda=22.9.0

Doing “%pip install pymc>=5” on a fresh Base Python 3.0 image:

  • Installs.
  • On import, gives NumPy C-API warning.
  • Works otherwise.

Doing “%conda install pymc>=5” on a fresh Base Python 3.0 image:

  • Fails install with error: ValueError: The python kernel does not appear to be a conda environment.
    It seems conda install is not allowed from base python image.

Have you tried conda install -c conda-forge pymc>=5?

Yes! That fails on the Base Python 3.0 image, since apparently it does not have conda at all.

The Data Science 3.0 image gives same error as conda install did: ResolvePackageNotFound:

  • conda=22.9.0

At this point, even though I can get pymc to install (with the C-API warning) on Base Python 3.0 image, I am intimidated by Sagemaker’s other requirements to automate projects. For example, to run in a scheduled mode I need to define a custom image and use it for every spin up. I am scared I’ll spend a bunch of time on that without a pay off.

So, I think I am pivoting my project to my personal ECS machine, which lets me create a nice fixed conda env, and where I can import pymc without any of the warnings.

I expect to return to Sagemaker in the future and try again! Would love to continue troubleshooting in parallel if anyone has more suggestions.

1 Like

Have you tried micromamba?

Thanks for the suggestion. That’s a new thing to me; I read about, tried to use it. Failed for following reasons:

  • Base Python 3.0 image, used Sagemaker Studio’s button to “Launch Terminal in current Sagemaker Image.”
  • Did install script suggested by Micromamba installation directions (Installation — documentation).
  • Any attempt to install any package (numpy, pandas, pymc) fails with errors:
    “error libmamba No target prefix specified
    critical libmamba Aborting.”

Trying the other image…

  • Data Science 3.0 image, used Sagemaker Studio’s button to “Launch Terminal in current Sagemaker Image.”
  • Did install script suggested by Micromamba installation directions (Installation — documentation).
  • Micromamba won’t even install. Terminal complains, “bash: curl: command not found”

Promising update…documented progress here so far.

Trying to solve the C-API warning after doing “pip install `pymc>=5” on the Base Python 3.0 image.

Read through this: WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.-------->HELP! - #10 by karimkhani
Tried modified versions of what lucianopaz and Lukebetham suggest in this thread.

First, did “pip install numpy scipy mkl”
Second, do “apt-get update” and “apt-get install libopenblas-dev”
The suggested “sudo” command does not work on my Sagemaker Studio terminal.

Then, I can restart the kernal and import pymc without any warnings!

I don’t yet know if the apt-get commands can be automated with a scheduled Sagemaker Studio job. pip installs definitely can by putting “%pip install” in the notebook itself, but the apt-get commands fail that way. I only know to run them by clicking “Launch Terminal in current Sagemaker Image”, which may or may not be automatable. I will find out…

apt-get install curl

Successfully automated the apt-get commands by attaching a Sagemaker “Lifecycle Configuration” start-up script to my scheduled Notebook Jobs:

#!/bin/bash
set -eux

# https://superuser.com/questions/1496529/sudo-apt-get-update-couldnt-create-temporary-file
chmod 1777 /tmp

apt-get update
apt-get install -y libopenblas-dev

apt-get install -y graphviz

I work at Amazon and maintain my own pymc-based project in Sagemaker Studio. So, anyone else please feel free to reach out if you need help using Sagemaker with pymc!

3 Likes

A gist or blogpost guide could be very useful for other folks if you have the chance!

3 Likes

I installed it recently on AWS Sagemaker notebook with two different methods:

1. 
%pip install pytensor pymc

2. 
!conda create -c conda-forge -n my_pymc_env "pymc>=5" --yes
!conda activate my_pymc_env
!pip install pymc

Hope this helps you as well!

Thanks, Sara. Great to hear you have success.

I concur both #1 and #2 methods will install pymc on the Base Python 3.0 image, but it follows with the C-API warning upon import. Much of what I wrote above is how I resolve that warning and get it to work on Base Python 3.0 with no warnings. However, both #1 and #2 straight up fail for me on the Data Science 3.0 image.