I’m new to PyMC3. I’d like to run PyMC3 in a CI Pipeline (GitLab). But it seems the Docker container just starts a jupyter notebook. How can I configure the container, so that it runs my python script instead of serving a notebook?
Or alternatively, how to run all cells of my own notebook and close it afterwards?
Or is this only supported by the Tidelift subscription?
Is the docker image already configured to use a compiler for theano?
I haven’t been able to test the container locally yet. After a few attempts I was able to pull the docker image (v3.9.3) from within WSL2 on Windows 10 (failed several times), but when running a container with the image in WSL2, I don’t know how to open the notebook with a browser on Windows. The given URLs can’t be reached from Windows.
Any help appreciated!
A couple of things. There is no tidelift subscription required to run PyMC3 code however you please. PyMC3 is open source so really you can do whatever you want with it.
In terms of the docker container there’s a couple of routes here. You 're dead set on the default PyMC3 container and you could override the last layer so it runs the command you’re looking for rather than running a notebook. I do not suggest this though as our container is more setup for interactive use locally rather than for CI pipelines.
What I suggest is writing your own dockerfile that installs pymc3 on top of a python base image, and then running whatever code you want.
I know docker can be confusing but I do hope this points you in the right direction.
Start notebook on WSL2
On WSL2 remember to add (dynamic) IP address to the start command
jupyter lab --ip $(hostname -I)
Users with multiple IP addresses
jupyter lab --ip $(echo $(hostname -I) | rev | cut -d " " -f1 | rev)
And access the IP address, port and token combination given on the command window.
I can’t call these commands from the docker container.
The problem is that the
docker run command starts the notebook itself and it doesn’t seem to take arguments, or I don’t know how to provide them. I tried starting the docker container with the
--it argument, but the notebook still starts and doesn’t give me an option to start it by myself as you suggest (commandline is blocked when the notebook runs). When pressing ctrl+c the notebook is stopped, but the container still runs in the background.
I managed to attach VSCode to the remote docker container running in WSL2, but when opening the getting started notebook, for example, I get ModuleNotFound errors for arviz and theano. They don’t seem to be part of the container, hence the container is pretty useless to me in its current state.
I now will try to setup an CI/CD environment with micromamba and install dependencies from my environment.yml.
I don’t know enough about docker yet to create my own image as @RavinKumar suggested, but I’d prefer if there was one suitable for CI pipelines.
You’re right the docker container is setup to run a notebook by default. To override it you can supply commands or create a new image with the origin as a place where you replace either the entrypoint and cmd command. I realize this may not mean much without knowing a lot about docker, but unfortunately that’s just the challenge of learning docker. It’s quite a somewhat complex tool that takes a bit to wrap your head around.
As you’ve mentioned CI/CD doesn’t require docker, usually those pipelines spin up their own VM so micromamba sounds like a fine choice to get your environment setup. From our experience with docker in CI/CD it actually causes other problems, and we eventually switched off of it.
If you’d like to learn more about docker I do suggest doing the tutorials and reading the official docs, they do a much better job of explaining the tool than we can here. I also wrote a blog post about how we used docker in the ArviZ project, in which PyMC was installed as well. Hope this helps
Thanks for your kind reply and taking the time to help! In the meantime I managed to set up my CI/CD without a special PyMC3 docker image, but I’d like to revisit docker for PyMC3 at a later time for a proper solution to my question. Your blog post might come in handy then!
Due to my lack of familiarity with these tools I had quite a trial & error odyssey. First I tried alpine and micromamba, which failed when following the micromamba docs. Then I tried an alpine+miniconda3 image, which also failed, because I wasn’t able to get permissions to install the build tools like g++. For now, I use the regular miniconda3 docker image, install build-essentials and everything else in my environment.yml through mamba. The resulting conda environment gets cached for sharing between jobs and runs to cut down on the time it takes. Here’s my CD setup at time of writing.
Look over this, this looks like a very reasonable setup. We also used to use miniconda in the ArviZ CI/CD. For pipeline purposes the headache of microimages like Alpine really aren’t worth it. For your own time and sanity I’d advise just sticking with a “large” prebuilt image with env and spending your life doing things that are much more enjoyable than debugging docker build failures in ci pipelines