Performance tip if you're on ARM64 (Apple's M1/M2 chips): Install accelerate

Using the nutpie sampler we just found a significant performance difference between using the default OpenBLAS vs the Apple’s accelerate library on an M1 Mac. Basically this is MKL for ARM64 chips.

You can change the blas version that’s installed in your env using:
micromamba install "libblas=*=*accelerate"

UPDATE Oct 17 2023:

As of PyTensor 2.17.3, accelerate gets automatically installed on ARM64. You don’t need to run the above command.

6 Likes

Installed. Definitely notice a speed-up.

1 Like

Does this work on pip as well?

No, it does not work on PIP. You get error that asks if you meant “==”. At least that’s what it did for me.

I think only conda/mamba ship their own libblas. pip would try to use the one that’s installed on your system. So you would probably have to install it there (via brew?) and then make sure everything gets compiled against that.

We should probably add this to the Conda recipe so that it happens automatically.

4 Likes

I definitely have to try to make use of my M2 CPU… Still running MKL via Rosetta2 since I only recently exited dependency hell and can’t face it with another new stack :smiley:

In case anyone cares, this is my current minimal condaenv.yml. Sadly the very latest pymc=5.7.0 leads to more dependency hell via clang

# Manually created as-at 2022-02-15
# Last updated as-at 2023-08-02
# NOTE:
#  + Creates a virtual env for project usage
#  + Require running on Intel x86 AMD64 CPU (or Rosetta2 on MacOS)
#  + Install with mamba via Makefile, there's also a fuller set of deps to be
#    installed by pip in the pyproject.toml
#  + Force MKL version: 2022 version(s) dont work on MacOS
#    see https://stackoverflow.com/a/71640311/1165112
#  + Force install BLAS with MKL via libblas (note not "blas")
#  + Force install numpy MKL: only available in defaults (pkgs/main)
#    see https://github.com/conda-forge/numpy-feedstock/issues/84#issuecomment-385186685
name: oreum_lab
channels:
  - conda-forge
  # - defaults
dependencies:
- pkgs/main::numpy>=1.24.3  # force numpy MKL see NOTE
- conda-forge::ipykernel>=6.23.1
- conda-forge::libblas=*[build=*mkl]  # force BLAS with MKL see NOTE
- conda-forge::libcblas=*[build=*mkl]  # force BLAS with MKL see NOTE
- conda-forge::liblapack=*[build=*mkl]  # force BLAS with MKL see NOTE
- conda-forge::mkl==2021.4.*  # force MKL version see NOTE
- conda-forge::mkl-service==2.4.*
- conda-forge::python==3.10.*
- conda-forge::pymc==5.6.1
1 Like

Installation on native OSX ARM64 is quite trivial for me, give it a shot, I’d give it a 95% probability it will just work.

2 Likes

Hmm, 95% seems catastrophically low to me. :joy: Let’s rather shoot for 99.9%!

Thanks @jonsedar for sharing your magic. Let’s add it to the PyTensor conda-forge feedstock. I’m going to have to read through all your notes first.

2 Likes

I’m all ears to any improvements or fat-cutting you can suggest :smiley: This is just the result of trial and error and probably contains unnecessary stuff.

I’m installing into the latest MacOS (Ventura) via mambaforge

https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-MacOSX-x86_64.sh

Any luck with improving performance on your M2? I just got one would love to learn what worked for you.

If that’s aimed at me, I’m still using the same recipe I noted above - no issues for now. I’ll worry about it if/when I need to speed up sampling

1 Like

Note that there really is no point to emulating x86 on ARM64 chips, it’s just slower for no benefit.

1 Like

Sounds totally reasonable, and I would love to move to to native arm64 processing, but I typically always need to deploy my code to a non-arm CPU for production usage. Are there likely to be environment issues if I have to port code from ARM to Intel?

No, I don’t see how that could happen.

The conda version of this worked for me on macOS M2 chip, i.e.,
conda install -c conda-forge 'libblas=*=*accelerate'

2 Likes

As of PyTensor 2.17.3, accelerate gets automatically installed on ARM64. You don’t need to run the above command.

3 Likes

I started to get this error, after updating the OS to Sonora (and the above ‘libblas=*=*accelerate’ does not solve it)

When I try to sample, I get this error:

TypeError: cannot pickle ‘fortran’ object

affects also previously installed and new virtual environments… anyone managed to solve this?

What’s your output of conda list?