Environment not working anymore on macos

Duc_Quang_Nguyen · September 18, 2024, 12:00pm

I had some issues with running PyMC (versions 5.6.1 and 5.16.2) after updating ~~macOS and~~ Xcode only (to version 15.4). I am still on macOS 14 Sonoma.

I agree with @twiecki that clang++ of PyMC is pointing to the wrong version. It should be pointing to the one with Xcode installation. I agree with @Werdna48 that the issue is with Xcode
This is the weird thing that happened after I upgrade Xcode.

I tried this in my Terminal:

xcodebuild -version

and I got this error

xcode-select: error: tool 'xcodebuild' requires Xcode, but active developer directory '/Library/Developer/CommandLineTools' is a command line tools instance

If anyone has this same error like me, what you should do is to follow this article here: macos - xcode-select active developer directory error - Stack Overflow

Enter this line of code into your Terminal to point your clang++

sudo xcode-select -s /Applications/Xcode.app/Contents/Developer

This will point Xcode (and clang++) to the correct folder.

kkumashiro · September 19, 2024, 12:17am

I upgraded my Apple Silicon machine to MacOS 15.0 today and ran into the same issue with clang++ after the Xcode Command Line Tools update kicked in. I’m not sure if it’s only a VSCode issue as I don’t use VSCode anymore and instead run my Python codes in a terminal or on JupyterLab. Of all the fixes and workarounds proposed in this thread so far, the only one I’ve found that works for me is @twiecki’s manually setting the clang++ path from within the Python files themselves. Neither uninstalling and reinstalling Anaconda, deleting and reinstalling Command Line Tools, nor installing a full version of Xcode and running xcode-select -s /Applications/Xcode.app/Contents/Developer fixed the issue for me

Duc_Quang_Nguyen · September 19, 2024, 12:26am

@kkumashiro thanks for letting us know! Just to confirm, your Xcode is 16.0 right?

kkumashiro · September 19, 2024, 12:49am

No problem! Yes, that’s right. And running clang++ -v in the terminal outputs the following:

Apple clang version 16.0.0 (clang-1600.0.26.3)
Target: arm64-apple-darwin24.0.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

I suppose I could try removing /Library/Developer/CommandLineTools and installing, say, Command Line Tools for Xcode 15.4 and see if that resolves anything, but that would still beg the question: Why isn’t the clang++ from the activated environment being selected? For reference, running clang++ -v after activating my PyMC Conda environment results in a different output:

clang version 17.0.6
Target: arm64-apple-darwin24.0.0
Thread model: posix
InstalledDir: /opt/anaconda3/envs/pymc_env/bin

Duc_Quang_Nguyen · September 19, 2024, 2:02pm

On that note about the “fix”:

pytensor.config.cxx = ''

If I used this flag in my code, I will have several errors:

/Users/ducquangnguyen/opt/anaconda3/envs/pymc5_6_1/lib/python3.11/site-packages/pytensor/tensor/rewriting/elemwise.py:1019: 
UserWarning: Loop fusion failed because the resulting node would exceed the kernel argument limit.

/Users/ducquangnguyen/opt/anaconda3/envs/pymc5_6_1/lib/python3.11/site-packages/pytensor/tensor/elemwise.py:781: 
RuntimeWarning: overflow encountered in square
  variables = ufunc(*ufunc_args, **ufunc_kwargs)

My model is running very very slowly if I turn that flag on.

If I set this instead,

pytensor.config.cxx = '/usr/bin/clang++'

performance will be better!

drbenvincent · September 22, 2024, 10:05am

I also had this problem when upgrading to MacOS 15.0 on an Apple Silicon machine. Problem occurred both in VS Code and in the terminal. But I fixed by reinstalling command line tools

sudo rm -rf /Library/Developer/CommandLineTools
sudo xcode-select --install

and rebuilding my environment.

I didn’t have to do anything else.

danieltomasz · September 23, 2024, 10:07pm

I can run simple models with pytensor.config.cxx = "/usr/bin/clang++" but now I cannot use accelerate

2024-09-24 01:05:42,742 [ WARNING] Using NumPy C-API based implementation for BLAS functions.

Apple bumped LAPACK to 3.11, might it be related?

jonsedar · September 25, 2024, 7:56am

Just my 2p, despite sticking with MacOS Sonoma 14.7, I experienced the same issue when I foolishly allowed XCode CommandLineTools to “upgrade” to 16.0. My remedy was to hard-uninstall and reinstall a lower version using the same commands as Ben above. I didnt have to rebuild the python env either, all worked the same as before

Now I’m apparently on (internal, unhelpfully different from named) version 2408

$> xcode-select -v
xcode-select version 2408.

$> clang++ -v
Apple clang version 15.0.0 (clang-1500.3.9.4)
Target: arm64-apple-darwin23.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

danieltomasz · September 25, 2024, 8:59pm

For all people claiming that the methods presented here work for you:
Could confirm that you can run this without problem?

python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")

I can import pytensor and run simple models (I use bambi) but on more complex it crashes and it don’t pass this check

drbenvincent · September 27, 2024, 7:25pm

So I’m not so confident in my previous fix now. After doing that there was a software update for command line tools in the System Settings > Software Update. Things not working at that point.

I seem to be able to do a temporary fix:

Load up an ipython session in terminal
import pymc as pm gives me the BLAS error, WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
But if I do this:

import pytensor
pytensor.config.cxx = '/usr/bin/clang++'

Then I can run this code fine:

with pm.Model():
    pm.Normal("x")
    pm.sample()

However, the fix is temporary. Quitting the ipython session and loading up again, I have to follow the same steps

Context:

danieltomasz · September 28, 2024, 4:55pm

hi @drbenvincent conda install clang 17.06 by default (I am able to get openblas there and it works without segmentation errors )
with pythensor from pip I am geting BLAS warning
I spend half day today to get pytensor working with accelerate, but without success, one info that might be useful - numpy 2.2 installed via pip use accelerate, numpy 2.2 installed via conda installs with openblas, so this might be conda issue, I created a github issue on pytensor github here

github.com/pymc-devs/pytensor

pytensor and blas problems on on MacOS 15 Sequoia with Apple Silicon

opened 04:44PM - 28 Sep 24 UTC

danieltomasz

bug

### Describe the issue: Since update to MacOS 15 I have a problem with using …Apple implementation of BLAS. Installing `pytensor` from `miniconda3-3.12-24.7.1-0` via ` conda create -n voxel-bayes-3.12 -c conda-forge pytensor` seems to install `openblas` instead of accelerate. ``` ~/.pyenv/versions/miniconda3-3.12-24.7.1-0/bin/conda create -n voxel-bayes-3.12 -c conda-forge pytensor Channels: - conda-forge - defaults Platform: osx-arm64 Collecting package metadata (repodata.json): done Solving environment: done ## Package Plan ## environment location: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12 added / updated specs: - pytensor The following NEW packages will be INSTALLED: accelerate conda-forge/noarch::accelerate-0.34.2-pyhd8ed1ab_0 blas conda-forge/osx-arm64::blas-2.124-openblas blas-devel conda-forge/osx-arm64::blas-devel-3.9.0-24_osxarm64_openblas brotli-python conda-forge/osx-arm64::brotli-python-1.1.0-py312hde4cb15_2 bzip2 conda-forge/osx-arm64::bzip2-1.0.8-h99b78c6_7 ca-certificates conda-forge/osx-arm64::ca-certificates-2024.8.30-hf0a4a13_0 cctools_osx-arm64 conda-forge/osx-arm64::cctools_osx-arm64-1010.6-h4208deb_1 certifi conda-forge/noarch::certifi-2024.8.30-pyhd8ed1ab_0 cffi conda-forge/osx-arm64::cffi-1.17.1-py312h0fad829_0 charset-normalizer conda-forge/noarch::charset-normalizer-3.3.2-pyhd8ed1ab_0 clang conda-forge/osx-arm64::clang-17.0.6-default_h360f5da_7 clang-17 conda-forge/osx-arm64::clang-17-17.0.6-default_h146c034_7 clang_impl_osx-ar~ conda-forge/osx-arm64::clang_impl_osx-arm64-17.0.6-he47c785_19 clang_osx-arm64 conda-forge/osx-arm64::clang_osx-arm64-17.0.6-h54d7cd3_19 clangxx conda-forge/osx-arm64::clangxx-17.0.6-default_h360f5da_7 clangxx_impl_osx-~ conda-forge/osx-arm64::clangxx_impl_osx-arm64-17.0.6-h50f59cd_19 clangxx_osx-arm64 conda-forge/osx-arm64::clangxx_osx-arm64-17.0.6-h54d7cd3_19 colorama conda-forge/noarch::colorama-0.4.6-pyhd8ed1ab_0 compiler-rt conda-forge/osx-arm64::compiler-rt-17.0.6-h856b3c1_2 compiler-rt_osx-a~ conda-forge/noarch::compiler-rt_osx-arm64-17.0.6-h832e737_2 cons conda-forge/noarch::cons-0.4.6-pyhd8ed1ab_0 etuples conda-forge/noarch::etuples-0.3.9-pyhd8ed1ab_0 filelock conda-forge/noarch::filelock-3.16.1-pyhd8ed1ab_0 fsspec conda-forge/noarch::fsspec-2024.9.0-pyhff2d567_0 gmp conda-forge/osx-arm64::gmp-6.3.0-h7bae524_2 gmpy2 conda-forge/osx-arm64::gmpy2-2.1.5-py312h87fada9_2 h2 conda-forge/noarch::h2-4.1.0-pyhd8ed1ab_0 hpack conda-forge/noarch::hpack-4.0.0-pyh9f0ad1d_0 huggingface_hub conda-forge/noarch::huggingface_hub-0.25.1-pyhd8ed1ab_0 hyperframe conda-forge/noarch::hyperframe-6.0.1-pyhd8ed1ab_0 icu conda-forge/osx-arm64::icu-75.1-hfee45f7_0 idna conda-forge/noarch::idna-3.10-pyhd8ed1ab_0 jinja2 conda-forge/noarch::jinja2-3.1.4-pyhd8ed1ab_0 ld64_osx-arm64 conda-forge/osx-arm64::ld64_osx-arm64-951.9-hc81425b_1 libabseil conda-forge/osx-arm64::libabseil-20240116.2-cxx17_h00cdb27_1 libblas conda-forge/osx-arm64::libblas-3.9.0-24_osxarm64_openblas libcblas conda-forge/osx-arm64::libcblas-3.9.0-24_osxarm64_openblas libclang-cpp17 conda-forge/osx-arm64::libclang-cpp17-17.0.6-default_h146c034_7 libcxx conda-forge/osx-arm64::libcxx-19.1.0-ha82da77_0 libcxx-devel conda-forge/osx-arm64::libcxx-devel-17.0.6-h86353a2_6 libexpat conda-forge/osx-arm64::libexpat-2.6.3-hf9b8971_0 libffi conda-forge/osx-arm64::libffi-3.4.2-h3422bc3_5 libgfortran conda-forge/osx-arm64::libgfortran-5.0.0-13_2_0_hd922786_3 libgfortran5 conda-forge/osx-arm64::libgfortran5-13.2.0-hf226fd6_3 libiconv conda-forge/osx-arm64::libiconv-1.17-h0d3ecfb_2 liblapack conda-forge/osx-arm64::liblapack-3.9.0-24_osxarm64_openblas liblapacke conda-forge/osx-arm64::liblapacke-3.9.0-24_osxarm64_openblas libllvm17 conda-forge/osx-arm64::libllvm17-17.0.6-h5090b49_2 libopenblas conda-forge/osx-arm64::libopenblas-0.3.27-openmp_h517c56d_1 libprotobuf conda-forge/osx-arm64::libprotobuf-4.25.3-hc39d83c_1 libsqlite conda-forge/osx-arm64::libsqlite-3.46.1-hc14010f_0 libtorch conda-forge/osx-arm64::libtorch-2.4.0-cpu_generic_h4365fe2_1 libuv conda-forge/osx-arm64::libuv-1.49.0-hd74edd7_0 libxml2 conda-forge/osx-arm64::libxml2-2.12.7-h01dff8b_4 libzlib conda-forge/osx-arm64::libzlib-1.3.1-hfb2fe0b_1 llvm-openmp conda-forge/osx-arm64::llvm-openmp-18.1.8-hde57baf_1 llvm-tools conda-forge/osx-arm64::llvm-tools-17.0.6-h5090b49_2 logical-unificati~ conda-forge/noarch::logical-unification-0.4.6-pyhd8ed1ab_0 macosx_deployment~ conda-forge/noarch::macosx_deployment_target_osx-arm64-11.0-h6553868_1 markupsafe conda-forge/osx-arm64::markupsafe-2.1.5-py312h024a12e_1 minikanren conda-forge/noarch::minikanren-1.0.3-pyhd8ed1ab_0 mpc conda-forge/osx-arm64::mpc-1.3.1-h8f1351a_1 mpfr conda-forge/osx-arm64::mpfr-4.2.1-hb693164_3 mpmath conda-forge/noarch::mpmath-1.3.0-pyhd8ed1ab_0 multipledispatch conda-forge/noarch::multipledispatch-0.6.0-pyhd8ed1ab_1 ncurses conda-forge/osx-arm64::ncurses-6.5-h7bae524_1 networkx conda-forge/noarch::networkx-3.3-pyhd8ed1ab_1 nomkl conda-forge/noarch::nomkl-1.0-h5ca1d4c_0 numpy conda-forge/osx-arm64::numpy-1.26.4-py312h8442bc7_0 openblas conda-forge/osx-arm64::openblas-0.3.27-openmp_h560b219_1 openssl conda-forge/osx-arm64::openssl-3.3.2-h8359307_0 packaging conda-forge/noarch::packaging-24.1-pyhd8ed1ab_0 pip conda-forge/noarch::pip-24.2-pyh8b19718_1 psutil conda-forge/osx-arm64::psutil-6.0.0-py312h024a12e_1 pycparser conda-forge/noarch::pycparser-2.22-pyhd8ed1ab_0 pysocks conda-forge/noarch::pysocks-1.7.1-pyha2e5f31_6 pytensor conda-forge/osx-arm64::pytensor-2.25.4-py312h3f593ad_0 pytensor-base conda-forge/osx-arm64::pytensor-base-2.25.4-py312h02baea5_0 python conda-forge/osx-arm64::python-3.12.6-h739c21a_1_cpython python_abi conda-forge/osx-arm64::python_abi-3.12-5_cp312 pytorch conda-forge/osx-arm64::pytorch-2.4.0-cpu_generic_py312h6bd8f41_1 pyyaml conda-forge/osx-arm64::pyyaml-6.0.2-py312h024a12e_1 readline conda-forge/osx-arm64::readline-8.2-h92ec313_1 requests conda-forge/noarch::requests-2.32.3-pyhd8ed1ab_0 safetensors conda-forge/osx-arm64::safetensors-0.4.5-py312he431725_0 scipy conda-forge/osx-arm64::scipy-1.14.1-py312heb3a901_0 setuptools conda-forge/noarch::setuptools-75.1.0-pyhd8ed1ab_0 sigtool conda-forge/osx-arm64::sigtool-0.1.3-h44b9a77_0 six conda-forge/noarch::six-1.16.0-pyh6c4a22f_0 sleef conda-forge/osx-arm64::sleef-3.7-h7783ee8_0 sympy conda-forge/noarch::sympy-1.13.3-pypyh2585a3b_103 tapi conda-forge/osx-arm64::tapi-1300.6.5-h03f4b80_0 tk conda-forge/osx-arm64::tk-8.6.13-h5083fa2_1 toolz conda-forge/noarch::toolz-0.12.1-pyhd8ed1ab_0 tqdm conda-forge/noarch::tqdm-4.66.5-pyhd8ed1ab_0 typing-extensions conda-forge/noarch::typing-extensions-4.12.2-hd8ed1ab_0 typing_extensions conda-forge/noarch::typing_extensions-4.12.2-pyha770c72_0 tzdata conda-forge/noarch::tzdata-2024a-h8827d51_1 urllib3 conda-forge/noarch::urllib3-2.2.3-pyhd8ed1ab_0 wheel conda-forge/noarch::wheel-0.44.0-pyhd8ed1ab_0 xz conda-forge/osx-arm64::xz-5.2.6-h57fd34a_0 yaml conda-forge/osx-arm64::yaml-0.2.5-h3422bc3_2 zstandard conda-forge/osx-arm64::zstandard-0.23.0-py312h15fbf35_1 zstd conda-forge/osx-arm64::zstd-1.5.6-hb46c0d2_0 Proceed ([y]/n)? y ``` Running this the check ``` python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')") Some results that you can compare against. They were 10 executions of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000). All memory layout was in C order. CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB), Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB), Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB), Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?) Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled), Core i7 950(3.07GHz, hyper-threads enabled) Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled) Libraries tested: * numpy with ATLAS from distribution (FC9) package (1 thread) * manually compiled numpy and ATLAS with 2 threads * goto 1.26 with 1, 2, 4 and 8 threads * goto2 1.13 compiled with multiple threads enabled Xeon Xeon Xeon Core2 i7 i7 Xeon Xeon lib/nb threads E5345 E5430 E5450 E8500 930 950 X5560 X5550 numpy 1.3.0 blas 775.92s numpy_FC9_atlas/1 39.2s 35.0s 30.7s 29.6s 21.5s 19.60s goto/1 18.7s 16.1s 14.2s 13.7s 16.1s 14.67s numpy_MAN_atlas/2 12.0s 11.6s 10.2s 9.2s 9.0s goto/2 9.5s 8.1s 7.1s 7.3s 8.1s 7.4s goto/4 4.9s 4.4s 3.7s - 4.1s 3.8s goto/8 2.7s 2.4s 2.0s - 4.1s 3.8s openblas/1 14.04s openblas/2 7.16s openblas/4 3.71s openblas/8 3.70s mkl 11.0.083/1 7.97s mkl 10.2.2.025/1 13.7s mkl 10.2.2.025/2 7.6s mkl 10.2.2.025/4 4.0s mkl 10.2.2.025/8 2.0s goto2 1.13/1 14.37s goto2 1.13/2 7.26s goto2 1.13/4 3.70s goto2 1.13/8 1.94s goto2 1.13/16 3.16s Test time in float32. There were 10 executions of gemm in float32 with matrices of shape 5000x5000 (M=N=K=5000) All memory layout was in C order. cuda version 8.0 7.5 7.0 gpu M40 0.45s 0.47s k80 0.92s 0.96s K6000/NOECC 0.71s 0.69s P6000/NOECC 0.25s Titan X (Pascal) 0.28s GTX Titan X 0.45s 0.45s 0.47s GTX Titan Black 0.66s 0.64s 0.64s GTX 1080 0.35s GTX 980 Ti 0.41s GTX 970 0.66s GTX 680 1.57s GTX 750 Ti 2.01s 2.01s GTX 750 2.46s 2.37s GTX 660 2.32s 2.32s GTX 580 2.42s GTX 480 2.87s TX1 7.6s (float32 storage and computation) GT 610 33.5s Some PyTensor flags: blas__ldflags= -L/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib -llapack -lblas -lcblas -lm -Wl,-rpath,/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib compiledir= /Users/daniel/.pytensor/compiledir_macOS-15.0-arm64-arm-64bit-arm-3.12.6-64 floatX= float64 device= cpu Some OS information: sys.platform= darwin sys.version= 3.12.6 | packaged by conda-forge | (main, Sep 22 2024, 14:07:06) [Clang 17.0.6 ] sys.prefix= /Users/daniel/.pyenv/versions/voxel-bayes-3.12 Some environment variables: MKL_NUM_THREADS= None OMP_NUM_THREADS= None GOTO_NUM_THREADS= None Numpy config: (used when the PyTensor flag "blas__ldflags" is empty) Build Dependencies: blas: detection method: pkgconfig found: true include directory: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include lib directory: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib name: blas openblas configuration: unknown pc file directory: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib/pkgconfig version: 3.9.0 lapack: detection method: internal found: true include directory: unknown lib directory: unknown name: dep4569863840 openblas configuration: unknown pc file directory: unknown version: 1.26.4 Compilers: c: args: -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0 commands: arm64-apple-darwin20.0.0-clang linker: ld64 linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -L/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0 name: clang version: 16.0.6 c++: args: -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0 commands: arm64-apple-darwin20.0.0-clang++ linker: ld64 linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -L/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0 name: clang version: 16.0.6 cython: commands: cython linker: cython name: cython version: 3.0.8 Machine Information: build: cpu: aarch64 endian: little family: aarch64 system: darwin cross-compiled: true host: cpu: arm64 endian: little family: aarch64 system: darwin Python Information: path: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/bin/python version: '3.12' SIMD Extensions: baseline: - NEON - NEON_FP16 - NEON_VFPV4 - ASIMD found: - ASIMDHP not found: - ASIMDFHM Numpy dot module: numpy Numpy location: /Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/numpy/__init__.py Numpy version: 1.26.4 We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000). Total execution time: 31.56s on CPU (with direct PyTensor binding to blas Try to run this script a few times. Experience shows that the first time is not as fast as following calls. The difference is not big, but consistent. ``` And when I try to run the same command but in env with pip installed pytensor results in this ``` Some PyTensor flags: blas__ldflags= compiledir= /Users/daniel/.pytensor/compiledir_macOS-15.0-arm64-arm-64bit-arm-3.12.6-64 floatX= float64 device= cpu Some OS information: sys.platform= darwin sys.version= 3.12.6 (main, Sep 28 2024, 17:45:34) [Clang 15.0.0 (clang-1500.3.9.4)] sys.prefix= /Users/daniel/.pyenv/versions/3.12.6/envs/zotero-3.12.6 Some environment variables: MKL_NUM_THREADS= None OMP_NUM_THREADS= None GOTO_NUM_THREADS= None Numpy config: (used when the PyTensor flag "blas__ldflags" is empty) /Users/daniel/.pyenv/versions/3.12.6/envs/zotero-3.12.6/lib/python3.12/site-packages/numpy/__config__.py:155: UserWarning: Install `pyyaml` for better output warnings.warn("Install `pyyaml` for better output", stacklevel=1) { "Compilers": { "c": { "name": "clang", "linker": "ld64", "version": "14.0.0", "commands": "cc", "args": "-fno-strict-aliasing, -DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64", "linker args": "-fno-strict-aliasing, -DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64" }, "cython": { "name": "cython", "linker": "cython", "version": "3.0.8", "commands": "cython" }, "c++": { "name": "clang", "linker": "ld64", "version": "14.0.0", "commands": "c++", "args": "-DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64", "linker args": "-DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64" } }, "Machine Information": { "host": { "cpu": "aarch64", "family": "aarch64", "endian": "little", "system": "darwin" }, "build": { "cpu": "aarch64", "family": "aarch64", "endian": "little", "system": "darwin" } }, "Build Dependencies": { "blas": { "name": "openblas64", "found": true, "version": "0.3.23.dev", "detection method": "pkgconfig", "include directory": "/opt/arm64-builds/include", "lib directory": "/opt/arm64-builds/lib", "openblas configuration": "USE_64BITINT=1 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS= NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= SANDYBRIDGE MAX_THREADS=3", "pc file directory": "/usr/local/lib/pkgconfig" }, "lapack": { "name": "dep4335021056", "found": true, "version": "1.26.4", "detection method": "internal", "include directory": "unknown", "lib directory": "unknown", "openblas configuration": "unknown", "pc file directory": "unknown" } }, "Python Information": { "path": "/private/var/folders/76/zy5ktkns50v6gt5g8r0sf6sc0000gn/T/cibw-run-q69bfk1p/cp312-macosx_arm64/build/venv/bin/python", "version": "3.12" }, "SIMD Extensions": { "baseline": [ "NEON", "NEON_FP16", "NEON_VFPV4", "ASIMD" ], "found": [ "ASIMDHP" ], "not found": [ "ASIMDFHM" ] } } Numpy dot module: numpy Numpy location: /Users/daniel/.pyenv/versions/3.12.6/envs/zotero-3.12.6/lib/python3.12/site-packages/numpy/__init__.py Numpy version: 1.26.4 We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000). Total execution time: 45.75s on CPU (with direct PyTensor binding to blas). Try to run this script a few times. Experience shows that the first time is not as fast as following calls. The difference is not big, but consistent. ``` When I try to specify the accelerate the old way via "libblas=*=*accelerate" when installing the conda environment, when I try to run this it fails , I copied the output here https://discourse.pymc.io/t/pytensor-support-to-apple-accelerate-blas-with-conda-forge-on-macos-15/15131/2 ### Reproducable code example: ```python from `python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")` ``` ### Error message: _No response_ ### PyTensor version information: conda-forge/osx-arm64::pytensor-2.25.4-py312h3f593ad_0 ### Context for the issue: _No response_

danieltomasz · September 28, 2024, 11:09pm

Could confirm if you install pytensor from conda, do you get *openblas or *accelerate libraries? Also if you get accelerate libraries instead of openblas can you run pytensor test without any problems ?

python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")

Werdna48 · September 30, 2024, 1:58am

I’m working on a macbook air m2 Sonoma 14.6.1.

I’ve been playing around with my environments more as I ran into an issue. after updating to xcode 16, it seems one of my two environments the bap3 environment that was being used was reporting clang++ errors, something like only clang 16>= is supported I am unsure. the environment can be found here (BAP3/bap3.yml at main · aloctavodia/BAP3 · GitHub).

In doing this I tried messing around with my pymc 5.16.2 environment, somehow breaking it and needing to reinstall. After reinstalling I ran python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')"), with the following traceback, it seems it is not using accelerate (there are errors).

(pymc_env) uqamcka3@psy-qjlf9kt Random % python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")

        Some results that you can compare against. They were 10 executions
        of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000).
        All memory layout was in C order.

        CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB),
                    Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
                    Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB),
                    Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?)
                    Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled),
                    Core i7 950(3.07GHz, hyper-threads enabled)
                    Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)


        Libraries tested:
            * numpy with ATLAS from distribution (FC9) package (1 thread)
            * manually compiled numpy and ATLAS with 2 threads
            * goto 1.26 with 1, 2, 4 and 8 threads
            * goto2 1.13 compiled with multiple threads enabled

                          Xeon   Xeon   Xeon  Core2 i7    i7     Xeon   Xeon
        lib/nb threads    E5345  E5430  E5450 E8500 930   950    X5560  X5550

        numpy 1.3.0 blas                                                775.92s
        numpy_FC9_atlas/1 39.2s  35.0s  30.7s 29.6s 21.5s 19.60s
        goto/1            18.7s  16.1s  14.2s 13.7s 16.1s 14.67s
        numpy_MAN_atlas/2 12.0s  11.6s  10.2s  9.2s  9.0s
        goto/2             9.5s   8.1s   7.1s  7.3s  8.1s  7.4s
        goto/4             4.9s   4.4s   3.7s  -     4.1s  3.8s
        goto/8             2.7s   2.4s   2.0s  -     4.1s  3.8s
        openblas/1                                        14.04s
        openblas/2                                         7.16s
        openblas/4                                         3.71s
        openblas/8                                         3.70s
        mkl 11.0.083/1            7.97s
        mkl 10.2.2.025/1                                         13.7s
        mkl 10.2.2.025/2                                          7.6s
        mkl 10.2.2.025/4                                          4.0s
        mkl 10.2.2.025/8                                          2.0s
        goto2 1.13/1                                                     14.37s
        goto2 1.13/2                                                      7.26s
        goto2 1.13/4                                                      3.70s
        goto2 1.13/8                                                      1.94s
        goto2 1.13/16                                                     3.16s

        Test time in float32. There were 10 executions of gemm in
        float32 with matrices of shape 5000x5000 (M=N=K=5000)
        All memory layout was in C order.


        cuda version      8.0    7.5    7.0
        gpu
        M40               0.45s  0.47s
        k80               0.92s  0.96s
        K6000/NOECC       0.71s         0.69s
        P6000/NOECC       0.25s

        Titan X (Pascal)  0.28s
        GTX Titan X       0.45s  0.45s  0.47s
        GTX Titan Black   0.66s  0.64s  0.64s
        GTX 1080          0.35s
        GTX 980 Ti               0.41s
        GTX 970                  0.66s
        GTX 680                         1.57s
        GTX 750 Ti               2.01s  2.01s
        GTX 750                  2.46s  2.37s
        GTX 660                  2.32s  2.32s
        GTX 580                  2.42s
        GTX 480                  2.87s
        TX1                             7.6s (float32 storage and computation)
        GT 610                          33.5s
        
Some PyTensor flags:
    blas__ldflags= -framework Accelerate
    compiledir= /Users/uqamcka3/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.12.6-64
    floatX= float64
    device= cpu
Some OS information:
    sys.platform= darwin
    sys.version= 3.12.6 | packaged by conda-forge | (main, Sep 22 2024, 14:08:13) [Clang 17.0.6 ]
    sys.prefix= /opt/miniconda3/envs/pymc_env
Some environment variables:
    MKL_NUM_THREADS= None
    OMP_NUM_THREADS= None
    GOTO_NUM_THREADS= None

Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/numpy/__config__.py:155: UserWarning: Install `pyyaml` for better output
  warnings.warn("Install `pyyaml` for better output", stacklevel=1)
{
  "Compilers": {
    "c": {
      "name": "clang",
      "linker": "ld64",
      "version": "16.0.6",
      "commands": "x86_64-apple-darwin13.4.0-clang",
      "args": "-march=core2, -mtune=haswell, -mssse3, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /opt/miniconda3/envs/pymc_env/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225541074/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/opt/miniconda3/envs/pymc_env=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /opt/miniconda3/envs/pymc_env/include, -mmacosx-version-min=10.9",
      "linker args": "-Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/opt/miniconda3/envs/pymc_env/lib, -L/opt/miniconda3/envs/pymc_env/lib, -march=core2, -mtune=haswell, -mssse3, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /opt/miniconda3/envs/pymc_env/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225541074/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/opt/miniconda3/envs/pymc_env=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /opt/miniconda3/envs/pymc_env/include, -mmacosx-version-min=10.9"
    },
    "cython": {
      "name": "cython",
      "linker": "cython",
      "version": "3.0.8",
      "commands": "cython"
    },
    "c++": {
      "name": "clang",
      "linker": "ld64",
      "version": "16.0.6",
      "commands": "x86_64-apple-darwin13.4.0-clang++",
      "args": "-march=core2, -mtune=haswell, -mssse3, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /opt/miniconda3/envs/pymc_env/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225541074/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/opt/miniconda3/envs/pymc_env=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /opt/miniconda3/envs/pymc_env/include, -mmacosx-version-min=10.9",
      "linker args": "-Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/opt/miniconda3/envs/pymc_env/lib, -L/opt/miniconda3/envs/pymc_env/lib, -march=core2, -mtune=haswell, -mssse3, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /opt/miniconda3/envs/pymc_env/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225541074/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/opt/miniconda3/envs/pymc_env=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /opt/miniconda3/envs/pymc_env/include, -mmacosx-version-min=10.9"
    }
  },
  "Machine Information": {
    "host": {
      "cpu": "x86_64",
      "family": "x86_64",
      "endian": "little",
      "system": "darwin"
    },
    "build": {
      "cpu": "x86_64",
      "family": "x86_64",
      "endian": "little",
      "system": "darwin"
    }
  },
  "Build Dependencies": {
    "blas": {
      "name": "blas",
      "found": true,
      "version": "3.9.0",
      "detection method": "pkgconfig",
      "include directory": "/opt/miniconda3/envs/pymc_env/include",
      "lib directory": "/opt/miniconda3/envs/pymc_env/lib",
      "openblas configuration": "unknown",
      "pc file directory": "/opt/miniconda3/envs/pymc_env/lib/pkgconfig"
    },
    "lapack": {
      "name": "dep4461187856",
      "found": true,
      "version": "1.26.4",
      "detection method": "internal",
      "include directory": "unknown",
      "lib directory": "unknown",
      "openblas configuration": "unknown",
      "pc file directory": "unknown"
    }
  },
  "Python Information": {
    "path": "/opt/miniconda3/envs/pymc_env/bin/python",
    "version": "3.12"
  },
  "SIMD Extensions": {
    "baseline": [
      "SSE",
      "SSE2",
      "SSE3",
      "SSSE3"
    ],
    "found": [
      "SSE41",
      "POPCNT",
      "SSE42"
    ],
    "not found": [
      "AVX",
      "F16C",
      "FMA3",
      "AVX2",
      "AVX512F",
      "AVX512CD",
      "AVX512_KNL",
      "AVX512_SKX",
      "AVX512_CLX",
      "AVX512_CNL",
      "AVX512_ICL"
    ]
  }
}
Numpy dot module: numpy
Numpy location: /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/numpy/__init__.py
Numpy version: 1.26.4

You can find the C code in this temporary file: /var/folders/jr/hs4p74j97ql54jgy5hzw28nc0000gr/T/pytensor_compilation_error_rz82ylra
ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): node: ExpandDims{axes=[0, 1]}(0.8)
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/graph/rewriting/basic.py", line 1909, in process_node
    replacements = node_rewriter.transform(fgraph, node)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/graph/rewriting/basic.py", line 1081, in transform
    return self.fn(fgraph, node)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/tensor/rewriting/basic.py", line 1122, in constant_folding
    thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py", line 119, in make_thunk
    return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py", line 84, in make_c_thunk
    outputs = cl.make_thunk(
              ^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1182, in make_thunk
    cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
                                                             ^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1103, in __compile__
    thunk, module = self.cthunk_factory(
                    ^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1627, in cthunk_factory
    module = cache.module_from_key(key=key, lnk=self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/cmodule.py", line 1255, in module_from_key
    module = lnk.compile_cmodule(location)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1528, in compile_cmodule
    module = c_compiler.compile_str(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/cmodule.py", line 2654, in compile_str
    raise CompileError(
pytensor.link.c.exceptions.CompileError: Compilation failed (return status=1):
/opt/miniconda3/envs/pymc_env/bin/clang++ -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -L/opt/miniconda3/envs/pymc_env/lib -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -undefined dynamic_lookup -I/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/numpy/core/include -I/opt/miniconda3/envs/pymc_env/include/python3.12 -I/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/c_code -L/opt/miniconda3/envs/pymc_env/lib -fvisibility=hidden -o /Users/uqamcka3/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.12.6-64/tmpht9h46p1/mb782a9925f26f74c46a75d98e1484e89ff6c5c482e4b63d738d2bb93e667f8f6.so /Users/uqamcka3/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.12.6-64/tmpht9h46p1/mod.cpp
dyld[72915]: Symbol not found: __ZNK4tapi2v119LinkerInterfaceFile28getPlatformsAndMinDeploymentEv
  Referenced from: <E33DCAC4-3116-3019-8003-432FB3E66FB4> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld
  Expected in:     <9918D37F-F19F-30B9-B311-13829B79C3B0> /opt/miniconda3/envs/pymc_env/lib/libtapi.dylib
clang++: error: unable to execute command: Abort trap: 6
clang++: error: linker command failed due to signal (use -v to see invocation)

HINT: Use a linker other than the C linker to print the inputs' shapes and strides.
HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

In a previous section I added the line:

conda env config vars set PYTENSOR_FLAGS="blas__ldflags=-framework Accelerate"

to remove this WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions., but perhaps it is just depressing the warning as @drbenvincent was saying.

in testing his example I need to add an extra line to get the code to actually run:

import pymc as pm
import numpy as np
import pytensor
pytensor.config.gcc__cxxflags = '-L/opt/miniconda3/envs/pymc_env/lib -march=native'
pytensor.config.cxx = '/usr/bin/clang++'
# %%
with pm.Model():
    pm.Normal("x")
    pm.sample()

If I run the code above without either of the pytensor flags set I get a compile error:

ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): node: Cast{float64}(-0.5)
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/graph/rewriting/basic.py", line 1909, in process_node
    replacements = node_rewriter.transform(fgraph, node)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/graph/rewriting/basic.py", line 1081, in transform
    return self.fn(fgraph, node)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/tensor/rewriting/basic.py", line 1122, in constant_folding
    thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py", line 119, in make_thunk
    return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py", line 84, in make_c_thunk
    outputs = cl.make_thunk(
              ^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1182, in make_thunk
    cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
                                                             ^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1103, in __compile__
    thunk, module = self.cthunk_factory(
                    ^^^^^^^^^^^^^^^^^^^^
...
clang++: error: unable to execute command: Abort trap: 6
clang++: error: linker command failed due to signal (use -v to see invocation)


Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
You can find the C code in this temporary file: /var/folders/jr/hs4p74j97ql54jgy5hzw28nc0000gr/T/pytensor_compilation_error_i_5ebbxd

You can find the C code in this temporary file: /var/folders/jr/hs4p74j97ql54jgy5hzw28nc0000gr/T/pytensor_compilation_error_108aax7z



































































































---------------------------------------------------------------------------
CompileError                              Traceback (most recent call last)
File /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/vm.py:1227, in VMLinker.make_all(self, profiler, input_storage, output_storage, storage_map)
   1223 # no-recycling is done at each VM.__call__ So there is
   1224 # no need to cause duplicate c code by passing
   1225 # no_recycling here.
   1226 thunks.append(
-> 1227     node.op.make_thunk(node, storage_map, compute_map, [], impl=impl)
   1228 )
   1229 linker_make_thunk_time[node] = time.perf_counter() - thunk_start

File /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py:119, in COp.make_thunk(self, node, storage_map, compute_map, no_recycling, impl)
    118 try:
--> 119     return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
    120 except (NotImplementedError, MethodNotDefined):
    121     # We requested the c code, so don't catch the error.

File /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py:84, in COp.make_c_thunk(self, node, storage_map, compute_map, no_recycling)
     83         raise NotImplementedError("float16")
---> 84 outputs = cl.make_thunk(
     85     input_storage=node_input_storage, output_storage=node_output_storage
     86 )
     87 thunk, node_input_filters, node_output_filters = outputs

File /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py:1182, in CLinker.make_thunk(self, input_storage, output_storage, storage_map, cache, **kwargs)
...
Inputs types: [TensorType(float32, shape=()), TensorType(float64, shape=()), TensorType(float64, shape=()), TensorType(float32, shape=()), TensorType(float64, shape=())]

HINT: Use a linker other than the C linker to print the inputs' shapes and strides.
HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

Perhaps this is a problem with my xcode?

(pymc_env) uqamcka3@psy-qjlf9kt Random % xcode-select -v
xcode-select version 2408.
(pymc_env) uqamcka3@psy-qjlf9kt Random % clang++ -v
clang version 17.0.6
Target: x86_64-apple-darwin23.6.0
Thread model: posix
InstalledDir: /opt/miniconda3/envs/pymc_env/bin

Ideally it would be great to get the BAP3 environment working as this textbook would be great to learn/use.

Werdna48 · September 30, 2024, 2:14am

EDIT: should’ve debugged a little more on my end, the solution was to add if __name__ == 'main': as a guard clause. However, this is interesting as before I was able to run it without needing this. If this should be deleted from the thread let me know.

Actually quick follow up.

It seems that my pymc 5.16.2 environment is not working anymore despite working in the last 10 minutes?

#%%
import pymc as pm
import numpy as np
import arviz as az
import pytensor
pytensor.config.gcc__cxxflags = '-L/opt/miniconda3/envs/pymc_env/lib -march=native'
pytensor.config.cxx = '/usr/bin/clang++'
# %%
with pm.Model():
    pm.Normal("x")
    pm.sample()

gives the following error:

(pymc_env) (base) uqamcka3@x86_64-apple-darwin13 Random % /opt/miniconda3/envs/pymc_env/bin/python /Users/uqamcka3/PHD/Projects/Random/mrp.p
y
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [x]
Sampling 4 chains, 0 divergences ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- / 0:00:02Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Auto-assigning NUTS sampler...
Sampling 4 chains, 0 divergences ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- / 0:00:02Initializing NUTS using jitter+adapt_diag...
Sampling 4 chains, 0 divergences ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- / 0:00:02Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [x]
Traceback (most recent call last):
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 131, in _main
    prepare(preparation_data)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 287, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/uqamcka3/PHD/Projects/Random/mrp.py", line 11, in <module>
    pm.sample()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 846, in sample
    _mp_sample(**sample_args, **parallel_args)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 1243, in _mp_sample
    sampler = ps.ParallelSampler(
              ^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 413, in __init__
    ProcessAdapter(
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 267, in __init__
    self._process.start()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/context.py", line 301, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html
        
Sampling 4 chains, 0 divergences ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- / 0:00:02
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [x]
Traceback (most recent call last):
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 131, in _main
    prepare(preparation_data)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 287, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/uqamcka3/PHD/Projects/Random/mrp.py", line 11, in <module>
    pm.sample()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 846, in sample
    _mp_sample(**sample_args, **parallel_args)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 1243, in _mp_sample
    sampler = ps.ParallelSampler(
              ^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 413, in __init__
    ProcessAdapter(
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 267, in __init__
    self._process.start()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/context.py", line 301, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html
        
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [x]
Traceback (most recent call last):
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 131, in _main
    prepare(preparation_data)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 287, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/uqamcka3/PHD/Projects/Random/mrp.py", line 11, in <module>
    pm.sample()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 846, in sample
    _mp_sample(**sample_args, **parallel_args)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 1243, in _mp_sample
    sampler = ps.ParallelSampler(
              ^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 413, in __init__
    ProcessAdapter(
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 267, in __init__
    self._process.start()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/context.py", line 301, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html
        
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [x]
Traceback (most recent call last):
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 131, in _main
    prepare(preparation_data)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 287, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/uqamcka3/PHD/Projects/Random/mrp.py", line 11, in <module>
    pm.sample()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 846, in sample
    _mp_sample(**sample_args, **parallel_args)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 1243, in _mp_sample
    sampler = ps.ParallelSampler(
              ^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 413, in __init__
    ProcessAdapter(
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 267, in __init__
    self._process.start()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/context.py", line 301, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html
        
Traceback (most recent call last):
  File "/Users/uqamcka3/PHD/Projects/Random/mrp.py", line 11, in <module>
    pm.sample()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 846, in sample
    _mp_sample(**sample_args, **parallel_args)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 1259, in _mp_sample
    for draw in sampler:
                ^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 471, in __iter__
    draw = ProcessAdapter.recv_draw(self._active)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 328, in recv_draw
    msg = ready[0].recv()
          ^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
          ^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/connection.py", line 430, in _recv_bytes
    buf = self._recv(4)
          ^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/connection.py", line 399, in _recv
    raise EOFError
EOFError

Werdna48 · September 30, 2024, 5:06am

Continuing on from my Bayesian Analysis in Python 3 (BAP3) comments. It seems the main issue I was having was jupyter not unpickling models when multiprocessing. For my solutions I always set the following pytensor.config settings:

pytensor.config.gcc__cxxflags = '-L/opt/miniconda3/envs/bap3/lib -O3 -march=native'
pytensor.config.cxx = '/usr/bin/clang++'
pytensor.config.blas__ldflags = '-framework Accelerate'

For those in the same position as me there are two (or more solutions):

set cores=1 to force no multiprocessing or
use a set the multiprocessing backend to fork i.e. pm.sample(1000, cores=4, chains=4, mp_ctx="fork"
rewrite the .ipynbs into scripts and run the models using the if __name__ == '__main__': guard clause.

If this is helpful to anyone that would be great either in terms of figuring out what is happening or getting it to work for them when going through the book (if this proves to no longer work I will edit this comment accordingly).

jonsedar · September 30, 2024, 6:17am

Just FYI below this the full (rather a lot!) output you requested for an environment / project that I’m currently working on, maybe helpful?

FYI I use mamba, part of miniforge GitHub - conda-forge/miniforge: A conda-forge distribution. to handle environments, although generally install via pip and makefiles. It’s a fairly arcane process these days but I can go into detail if you want


        Some results that you can compare against. They were 10 executions
        of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000).
        All memory layout was in C order.

        CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB),
                    Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
                    Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB),
                    Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?)
                    Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled),
                    Core i7 950(3.07GHz, hyper-threads enabled)
                    Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)


        Libraries tested:
            * numpy with ATLAS from distribution (FC9) package (1 thread)
            * manually compiled numpy and ATLAS with 2 threads
            * goto 1.26 with 1, 2, 4 and 8 threads
            * goto2 1.13 compiled with multiple threads enabled

                          Xeon   Xeon   Xeon  Core2 i7    i7     Xeon   Xeon
        lib/nb threads    E5345  E5430  E5450 E8500 930   950    X5560  X5550

        numpy 1.3.0 blas                                                775.92s
        numpy_FC9_atlas/1 39.2s  35.0s  30.7s 29.6s 21.5s 19.60s
        goto/1            18.7s  16.1s  14.2s 13.7s 16.1s 14.67s
        numpy_MAN_atlas/2 12.0s  11.6s  10.2s  9.2s  9.0s
        goto/2             9.5s   8.1s   7.1s  7.3s  8.1s  7.4s
        goto/4             4.9s   4.4s   3.7s  -     4.1s  3.8s
        goto/8             2.7s   2.4s   2.0s  -     4.1s  3.8s
        openblas/1                                        14.04s
        openblas/2                                         7.16s
        openblas/4                                         3.71s
        openblas/8                                         3.70s
        mkl 11.0.083/1            7.97s
        mkl 10.2.2.025/1                                         13.7s
        mkl 10.2.2.025/2                                          7.6s
        mkl 10.2.2.025/4                                          4.0s
        mkl 10.2.2.025/8                                          2.0s
        goto2 1.13/1                                                     14.37s
        goto2 1.13/2                                                      7.26s
        goto2 1.13/4                                                      3.70s
        goto2 1.13/8                                                      1.94s
        goto2 1.13/16                                                     3.16s

        Test time in float32. There were 10 executions of gemm in
        float32 with matrices of shape 5000x5000 (M=N=K=5000)
        All memory layout was in C order.


        cuda version      8.0    7.5    7.0
        gpu
        M40               0.45s  0.47s
        k80               0.92s  0.96s
        K6000/NOECC       0.71s         0.69s
        P6000/NOECC       0.25s

        Titan X (Pascal)  0.28s
        GTX Titan X       0.45s  0.45s  0.47s
        GTX Titan Black   0.66s  0.64s  0.64s
        GTX 1080          0.35s
        GTX 980 Ti               0.41s
        GTX 970                  0.66s
        GTX 680                         1.57s
        GTX 750 Ti               2.01s  2.01s
        GTX 750                  2.46s  2.37s
        GTX 660                  2.32s  2.32s
        GTX 580                  2.42s
        GTX 480                  2.87s
        TX1                             7.6s (float32 storage and computation)
        GT 610                          33.5s
        
Some PyTensor flags:
    blas__ldflags= -L/Users/jon/miniforge/envs/oreum_survival/lib -llapack -lblas -lcblas -lm -Wl,-rpath,/Users/jon/miniforge/envs/oreum_survival/lib
    compiledir= /Users/jon/.pytensor/compiledir_macOS-14.7-arm64-arm-64bit-arm-3.11.9-64
    floatX= float64
    device= cpu
Some OS information:
    sys.platform= darwin
    sys.version= 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
    sys.prefix= /Users/jon/miniforge/envs/oreum_survival
Some environment variables:
    MKL_NUM_THREADS= None
    OMP_NUM_THREADS= None
    GOTO_NUM_THREADS= None

Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /Users/jon/miniforge/envs/oreum_survival/include
    lib directory: /Users/jon/miniforge/envs/oreum_survival/lib
    name: blas
    openblas configuration: unknown
    pc file directory: /Users/jon/miniforge/envs/oreum_survival/lib/pkgconfig
    version: 3.9.0
  lapack:
    detection method: internal
    found: true
    include directory: unknown
    lib directory: unknown
    name: dep4377784592
    openblas configuration: unknown
    pc file directory: unknown
    version: 1.26.4
Compilers:
  c:
    args: -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem,
      /Users/jon/miniforge/envs/oreum_survival/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225640867/work=/usr/local/src/conda/numpy-1.26.4,
      -fdebug-prefix-map=/Users/jon/miniforge/envs/oreum_survival=/usr/local/src/conda-prefix,
      -D_FORTIFY_SOURCE=2, -isystem, /Users/jon/miniforge/envs/oreum_survival/include,
      -mmacosx-version-min=11.0
    commands: arm64-apple-darwin20.0.0-clang
    linker: ld64
    linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/jon/miniforge/envs/oreum_survival/lib,
      -L/Users/jon/miniforge/envs/oreum_survival/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong,
      -O2, -pipe, -isystem, /Users/jon/miniforge/envs/oreum_survival/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225640867/work=/usr/local/src/conda/numpy-1.26.4,
      -fdebug-prefix-map=/Users/jon/miniforge/envs/oreum_survival=/usr/local/src/conda-prefix,
      -D_FORTIFY_SOURCE=2, -isystem, /Users/jon/miniforge/envs/oreum_survival/include,
      -mmacosx-version-min=11.0
    name: clang
    version: 16.0.6
  c++:
    args: -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++,
      -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/jon/miniforge/envs/oreum_survival/include,
      -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225640867/work=/usr/local/src/conda/numpy-1.26.4,
      -fdebug-prefix-map=/Users/jon/miniforge/envs/oreum_survival=/usr/local/src/conda-prefix,
      -D_FORTIFY_SOURCE=2, -isystem, /Users/jon/miniforge/envs/oreum_survival/include,
      -mmacosx-version-min=11.0
    commands: arm64-apple-darwin20.0.0-clang++
    linker: ld64
    linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/jon/miniforge/envs/oreum_survival/lib,
      -L/Users/jon/miniforge/envs/oreum_survival/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong,
      -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0,
      -isystem, /Users/jon/miniforge/envs/oreum_survival/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225640867/work=/usr/local/src/conda/numpy-1.26.4,
      -fdebug-prefix-map=/Users/jon/miniforge/envs/oreum_survival=/usr/local/src/conda-prefix,
      -D_FORTIFY_SOURCE=2, -isystem, /Users/jon/miniforge/envs/oreum_survival/include,
      -mmacosx-version-min=11.0
    name: clang
    version: 16.0.6
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 3.0.8
Machine Information:
  build:
    cpu: aarch64
    endian: little
    family: aarch64
    system: darwin
  cross-compiled: true
  host:
    cpu: arm64
    endian: little
    family: aarch64
    system: darwin
Python Information:
  path: /Users/jon/miniforge/envs/oreum_survival/bin/python
  version: '3.11'
SIMD Extensions:
  baseline:
  - NEON
  - NEON_FP16
  - NEON_VFPV4
  - ASIMD
  found:
  - ASIMDHP
  not found:
  - ASIMDFHM

Numpy dot module: numpy
Numpy location: /Users/jon/miniforge/envs/oreum_survival/lib/python3.11/site-packages/numpy/__init__.py
Numpy version: 1.26.4

We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).

Total execution time: 7.96s on CPU (with direct PyTensor binding to blas).

Try to run this script a few times. Experience shows that the first time is not as fast as following calls. The difference is not big, but consistent.

twelvespot · October 15, 2024, 1:00pm

Hi, I’m just finding this thread and have the same problems with my 2023 M2 MacBook Air.

I cannot pm.sample(). I get all the same C and Pytensor errors.

Reading through all of this, is there a permanent fix yet? Sounds like folks can make some specific path adjustments and might get things to work. However, I’m new to this and just starting the Intuitive Bayes Intro Course, could I set up the environment in Google Collab, complete the course and bypass my personal computer for now???

ricardoV94 · October 15, 2024, 4:39pm

You can use Colab. Otherwise you can disable C backend on your local machine by running this code at the top of your script/notebook:

import pytensor
pytensor.config.cxx=""

Sampling may be slower but maybe still fast enough for the purpose of the course

fonnesbeck · October 15, 2024, 4:48pm

The multiprocessing error reported by @Werdna48 is related to the VSCode Jupyter extension, and not PyMC on the Mac per se. See if the bug persists outside of VSCode.

danieltomasz · November 4, 2024, 9:00pm

There were problems with linking to accelerate, the latest github commit version of pytensor should have this bug solved pytensor and blas problems on on MacOS 15 Sequoia with Apple Silicon · Issue #1005 · pymc-devs/pytensor · GitHub

Topic		Replies	Views
All models broken after MacOS Sequoia update	2	705	September 18, 2024
PyTensor fails to compile model after upgrading to mac OS 15.4 v5	33	852	June 11, 2025
Problems with installation of pymc after xcode update on mac installation	6	633	September 29, 2024
Pytensor compilation error version agnostic development , bug	18	5882	May 26, 2023
AssertionError on pytensor v5	6	65	November 10, 2024

Environment not working anymore on macos

Related topics