Using the nutpie sampler we just found a significant performance difference between using the default OpenBLAS vs the Apple’s accelerate library on an M1 Mac. Basically this is MKL for ARM64 chips.
You can change the blas version that’s installed in your env using: micromamba install "libblas=*=*accelerate"
UPDATE Oct 17 2023:
As of PyTensor 2.17.3, accelerate gets automatically installed on ARM64. You don’t need to run the above command.
I think only conda/mamba ship their own libblas. pip would try to use the one that’s installed on your system. So you would probably have to install it there (via brew?) and then make sure everything gets compiled against that.
I definitely have to try to make use of my M2 CPU… Still running MKL via Rosetta2 since I only recently exited dependency hell and can’t face it with another new stack
In case anyone cares, this is my current minimal condaenv.yml. Sadly the very latest pymc=5.7.0 leads to more dependency hell via clang…
# Manually created as-at 2022-02-15
# Last updated as-at 2023-08-02
# NOTE:
# + Creates a virtual env for project usage
# + Require running on Intel x86 AMD64 CPU (or Rosetta2 on MacOS)
# + Install with mamba via Makefile, there's also a fuller set of deps to be
# installed by pip in the pyproject.toml
# + Force MKL version: 2022 version(s) dont work on MacOS
# see https://stackoverflow.com/a/71640311/1165112
# + Force install BLAS with MKL via libblas (note not "blas")
# + Force install numpy MKL: only available in defaults (pkgs/main)
# see https://github.com/conda-forge/numpy-feedstock/issues/84#issuecomment-385186685
name: oreum_lab
channels:
- conda-forge
# - defaults
dependencies:
- pkgs/main::numpy>=1.24.3 # force numpy MKL see NOTE
- conda-forge::ipykernel>=6.23.1
- conda-forge::libblas=*[build=*mkl] # force BLAS with MKL see NOTE
- conda-forge::libcblas=*[build=*mkl] # force BLAS with MKL see NOTE
- conda-forge::liblapack=*[build=*mkl] # force BLAS with MKL see NOTE
- conda-forge::mkl==2021.4.* # force MKL version see NOTE
- conda-forge::mkl-service==2.4.*
- conda-forge::python==3.10.*
- conda-forge::pymc==5.6.1
Sounds totally reasonable, and I would love to move to to native arm64 processing, but I typically always need to deploy my code to a non-arm CPU for production usage. Are there likely to be environment issues if I have to port code from ARM to Intel?