Performance tip if you're on ARM64 (Apple's M1/M2 chips): Install accelerate

twiecki · July 11, 2023, 7:35pm

~~Using the nutpie sampler we just found a significant performance difference between using the default OpenBLAS vs the Apple’s accelerate library on an M1 Mac. Basically this is MKL for ARM64 chips.~~

You can change the blas version that’s installed in your env using:
micromamba install "libblas=*=*accelerate"

UPDATE Oct 17 2023:

As of PyTensor 2.17.3, accelerate gets automatically installed on ARM64. You don’t need to run the above command.

zweli · July 12, 2023, 12:02pm

Installed. Definitely notice a speed-up.

fonnesbeck · July 12, 2023, 5:05pm

Does this work on pip as well?

zweli · July 12, 2023, 5:07pm

No, it does not work on PIP. You get error that asks if you meant “==”. At least that’s what it did for me.

twiecki · July 12, 2023, 5:35pm

I think only conda/mamba ship their own libblas. pip would try to use the one that’s installed on your system. So you would probably have to install it there (via brew?) and then make sure everything gets compiled against that.

maresb · August 3, 2023, 12:49pm

We should probably add this to the Conda recipe so that it happens automatically.

jonsedar · August 3, 2023, 2:51pm

I definitely have to try to make use of my M2 CPU… Still running MKL via Rosetta2 since I only recently exited dependency hell and can’t face it with another new stack

In case anyone cares, this is my current minimal condaenv.yml. Sadly the very latest pymc=5.7.0 leads to more dependency hell via clang…

# Manually created as-at 2022-02-15
# Last updated as-at 2023-08-02
# NOTE:
#  + Creates a virtual env for project usage
#  + Require running on Intel x86 AMD64 CPU (or Rosetta2 on MacOS)
#  + Install with mamba via Makefile, there's also a fuller set of deps to be
#    installed by pip in the pyproject.toml
#  + Force MKL version: 2022 version(s) dont work on MacOS
#    see https://stackoverflow.com/a/71640311/1165112
#  + Force install BLAS with MKL via libblas (note not "blas")
#  + Force install numpy MKL: only available in defaults (pkgs/main)
#    see https://github.com/conda-forge/numpy-feedstock/issues/84#issuecomment-385186685
name: oreum_lab
channels:
  - conda-forge
  # - defaults
dependencies:
- pkgs/main::numpy>=1.24.3  # force numpy MKL see NOTE
- conda-forge::ipykernel>=6.23.1
- conda-forge::libblas=*[build=*mkl]  # force BLAS with MKL see NOTE
- conda-forge::libcblas=*[build=*mkl]  # force BLAS with MKL see NOTE
- conda-forge::liblapack=*[build=*mkl]  # force BLAS with MKL see NOTE
- conda-forge::mkl==2021.4.*  # force MKL version see NOTE
- conda-forge::mkl-service==2.4.*
- conda-forge::python==3.10.*
- conda-forge::pymc==5.6.1

twiecki · August 3, 2023, 6:44pm

Installation on native OSX ARM64 is quite trivial for me, give it a shot, I’d give it a 95% probability it will just work.

maresb · August 3, 2023, 6:56pm

Hmm, 95% seems catastrophically low to me. Let’s rather shoot for 99.9%!

Thanks @jonsedar for sharing your magic. Let’s add it to the PyTensor conda-forge feedstock. I’m going to have to read through all your notes first.

jonsedar · August 3, 2023, 7:41pm

I’m all ears to any improvements or fat-cutting you can suggest This is just the result of trial and error and probably contains unnecessary stuff.

I’m installing into the latest MacOS (Ventura) via mambaforge

https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-MacOSX-x86_64.sh

Daniel_Jurado · August 15, 2023, 2:43pm

Any luck with improving performance on your M2? I just got one would love to learn what worked for you.

jonsedar · August 19, 2023, 2:05pm

If that’s aimed at me, I’m still using the same recipe I noted above - no issues for now. I’ll worry about it if/when I need to speed up sampling

twiecki · August 22, 2023, 2:48pm

Note that there really is no point to emulating x86 on ARM64 chips, it’s just slower for no benefit.

jonsedar · August 22, 2023, 3:27pm

Sounds totally reasonable, and I would love to move to to native arm64 processing, but I typically always need to deploy my code to a non-arm CPU for production usage. Are there likely to be environment issues if I have to port code from ARM to Intel?

twiecki · August 22, 2023, 6:21pm

No, I don’t see how that could happen.

EAly · October 2, 2023, 4:21am

The conda version of this worked for me on macOS M2 chip, i.e.,
conda install -c conda-forge 'libblas=*=*accelerate'

twiecki · October 17, 2023, 6:00pm

As of PyTensor 2.17.3, accelerate gets automatically installed on ARM64. You don’t need to run the above command.

jjarato · November 14, 2023, 9:43pm

I started to get this error, after updating the OS to Sonora (and the above ‘libblas=*=*accelerate’ does not solve it)

jjarato · November 14, 2023, 10:14pm

When I try to sample, I get this error:

TypeError: cannot pickle ‘fortran’ object

affects also previously installed and new virtual environments… anyone managed to solve this?

twiecki · November 15, 2023, 9:06am

What’s your output of conda list?

Topic		Replies	Views
Importing pymc and sampling are slow on MacBook (I get blas warning) v5 installation	11	1188	August 4, 2023
Environment not working anymore on macos	40	2153	May 27, 2025
Pytensor g++ not detected v5	17	3570	October 9, 2023
Slow sampling speed with newer versions of PyMC v5 bug	39	1484	May 15, 2024
Check that pymc is using the right libraries	11	312	May 9, 2024

Performance tip if you're on ARM64 (Apple's M1/M2 chips): Install accelerate

UPDATE Oct 17 2023:

Related topics