Environment not working anymore on macos

I had some issues with running PyMC (versions 5.6.1 and 5.16.2) after updating macOS and Xcode only (to version 15.4). I am still on macOS 14 Sonoma.

I agree with @twiecki that clang++ of PyMC is pointing to the wrong version. It should be pointing to the one with Xcode installation. I agree with @Werdna48 that the issue is with Xcode
This is the weird thing that happened after I upgrade Xcode.

I tried this in my Terminal:

xcodebuild -version

and I got this error

xcode-select: error: tool 'xcodebuild' requires Xcode, but active developer directory '/Library/Developer/CommandLineTools' is a command line tools instance

If anyone has this same error like me, what you should do is to follow this article here: macos - xcode-select active developer directory error - Stack Overflow

Enter this line of code into your Terminal to point your clang++

sudo xcode-select -s /Applications/Xcode.app/Contents/Developer

This will point Xcode (and clang++) to the correct folder.

I upgraded my Apple Silicon machine to MacOS 15.0 today and ran into the same issue with clang++ after the Xcode Command Line Tools update kicked in. I’m not sure if it’s only a VSCode issue as I don’t use VSCode anymore and instead run my Python codes in a terminal or on JupyterLab. Of all the fixes and workarounds proposed in this thread so far, the only one I’ve found that works for me is @twiecki’s manually setting the clang++ path from within the Python files themselves. Neither uninstalling and reinstalling Anaconda, deleting and reinstalling Command Line Tools, nor installing a full version of Xcode and running xcode-select -s /Applications/Xcode.app/Contents/Developer fixed the issue for me

@kkumashiro thanks for letting us know! Just to confirm, your Xcode is 16.0 right?

No problem! Yes, that’s right. And running clang++ -v in the terminal outputs the following:

Apple clang version 16.0.0 (clang-1600.0.26.3)
Target: arm64-apple-darwin24.0.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

I suppose I could try removing /Library/Developer/CommandLineTools and installing, say, Command Line Tools for Xcode 15.4 and see if that resolves anything, but that would still beg the question: Why isn’t the clang++ from the activated environment being selected? For reference, running clang++ -v after activating my PyMC Conda environment results in a different output:

clang version 17.0.6
Target: arm64-apple-darwin24.0.0
Thread model: posix
InstalledDir: /opt/anaconda3/envs/pymc_env/bin

On that note about the “fix”:

pytensor.config.cxx = ''

If I used this flag in my code, I will have several errors:

/Users/ducquangnguyen/opt/anaconda3/envs/pymc5_6_1/lib/python3.11/site-packages/pytensor/tensor/rewriting/elemwise.py:1019: 
UserWarning: Loop fusion failed because the resulting node would exceed the kernel argument limit.
/Users/ducquangnguyen/opt/anaconda3/envs/pymc5_6_1/lib/python3.11/site-packages/pytensor/tensor/elemwise.py:781: 
RuntimeWarning: overflow encountered in square
  variables = ufunc(*ufunc_args, **ufunc_kwargs)

My model is running very very slowly if I turn that flag on.

If I set this instead,

pytensor.config.cxx = '/usr/bin/clang++'

performance will be better!

1 Like

I also had this problem when upgrading to MacOS 15.0 on an Apple Silicon machine. Problem occurred both in VS Code and in the terminal. But I fixed by reinstalling command line tools

sudo rm -rf /Library/Developer/CommandLineTools
sudo xcode-select --install

and rebuilding my environment.

I didn’t have to do anything else.

2 Likes

I can run simple models with pytensor.config.cxx = "/usr/bin/clang++" but now I cannot use accelerate

2024-09-24 01:05:42,742 [ WARNING] Using NumPy C-API based implementation for BLAS functions.

Apple bumped LAPACK to 3.11, might it be related?

Just my 2p, despite sticking with MacOS Sonoma 14.7, I experienced the same issue when I foolishly allowed XCode CommandLineTools to “upgrade” to 16.0. My remedy was to hard-uninstall and reinstall a lower version using the same commands as Ben above. I didnt have to rebuild the python env either, all worked the same as before

Now I’m apparently on (internal, unhelpfully different from named) version 2408

$> xcode-select -v
xcode-select version 2408.

$> clang++ -v
Apple clang version 15.0.0 (clang-1500.3.9.4)
Target: arm64-apple-darwin23.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
1 Like

For all people claiming that the methods presented here work for you:
Could confirm that you can run this without problem?

python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")

I can import pytensor and run simple models (I use bambi) but on more complex it crashes and it don’t pass this check

So I’m not so confident in my previous fix now. After doing that there was a software update for command line tools in the System Settings > Software Update. Things not working at that point.

I seem to be able to do a temporary fix:

  1. Load up an ipython session in terminal
  2. import pymc as pm gives me the BLAS error, WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
  3. But if I do this:
import pytensor
pytensor.config.cxx = '/usr/bin/clang++'

Then I can run this code fine:

with pm.Model():
    pm.Normal("x")
    pm.sample()

However, the fix is temporary. Quitting the ipython session and loading up again, I have to follow the same steps

Context:

hi @drbenvincent conda install clang 17.06 by default (I am able to get openblas there and it works without segmentation errors )
with pythensor from pip I am geting BLAS warning
I spend half day today to get pytensor working with accelerate, but without success, one info that might be useful - numpy 2.2 installed via pip use accelerate, numpy 2.2 installed via conda installs with openblas, so this might be conda issue, I created a github issue on pytensor github here

Could confirm if you install pytensor from conda, do you get *openblas or *accelerate libraries? Also if you get accelerate libraries instead of openblas can you run pytensor test without any problems ?

python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")

I’m working on a macbook air m2 Sonoma 14.6.1.

I’ve been playing around with my environments more as I ran into an issue. after updating to xcode 16, it seems one of my two environments the bap3 environment that was being used was reporting clang++ errors, something like only clang 16>= is supported I am unsure. the environment can be found here (BAP3/bap3.yml at main · aloctavodia/BAP3 · GitHub).

In doing this I tried messing around with my pymc 5.16.2 environment, somehow breaking it and needing to reinstall. After reinstalling I ran python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')"), with the following traceback, it seems it is not using accelerate (there are errors).

(pymc_env) uqamcka3@psy-qjlf9kt Random % python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")

        Some results that you can compare against. They were 10 executions
        of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000).
        All memory layout was in C order.

        CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB),
                    Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
                    Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB),
                    Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?)
                    Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled),
                    Core i7 950(3.07GHz, hyper-threads enabled)
                    Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)


        Libraries tested:
            * numpy with ATLAS from distribution (FC9) package (1 thread)
            * manually compiled numpy and ATLAS with 2 threads
            * goto 1.26 with 1, 2, 4 and 8 threads
            * goto2 1.13 compiled with multiple threads enabled

                          Xeon   Xeon   Xeon  Core2 i7    i7     Xeon   Xeon
        lib/nb threads    E5345  E5430  E5450 E8500 930   950    X5560  X5550

        numpy 1.3.0 blas                                                775.92s
        numpy_FC9_atlas/1 39.2s  35.0s  30.7s 29.6s 21.5s 19.60s
        goto/1            18.7s  16.1s  14.2s 13.7s 16.1s 14.67s
        numpy_MAN_atlas/2 12.0s  11.6s  10.2s  9.2s  9.0s
        goto/2             9.5s   8.1s   7.1s  7.3s  8.1s  7.4s
        goto/4             4.9s   4.4s   3.7s  -     4.1s  3.8s
        goto/8             2.7s   2.4s   2.0s  -     4.1s  3.8s
        openblas/1                                        14.04s
        openblas/2                                         7.16s
        openblas/4                                         3.71s
        openblas/8                                         3.70s
        mkl 11.0.083/1            7.97s
        mkl 10.2.2.025/1                                         13.7s
        mkl 10.2.2.025/2                                          7.6s
        mkl 10.2.2.025/4                                          4.0s
        mkl 10.2.2.025/8                                          2.0s
        goto2 1.13/1                                                     14.37s
        goto2 1.13/2                                                      7.26s
        goto2 1.13/4                                                      3.70s
        goto2 1.13/8                                                      1.94s
        goto2 1.13/16                                                     3.16s

        Test time in float32. There were 10 executions of gemm in
        float32 with matrices of shape 5000x5000 (M=N=K=5000)
        All memory layout was in C order.


        cuda version      8.0    7.5    7.0
        gpu
        M40               0.45s  0.47s
        k80               0.92s  0.96s
        K6000/NOECC       0.71s         0.69s
        P6000/NOECC       0.25s

        Titan X (Pascal)  0.28s
        GTX Titan X       0.45s  0.45s  0.47s
        GTX Titan Black   0.66s  0.64s  0.64s
        GTX 1080          0.35s
        GTX 980 Ti               0.41s
        GTX 970                  0.66s
        GTX 680                         1.57s
        GTX 750 Ti               2.01s  2.01s
        GTX 750                  2.46s  2.37s
        GTX 660                  2.32s  2.32s
        GTX 580                  2.42s
        GTX 480                  2.87s
        TX1                             7.6s (float32 storage and computation)
        GT 610                          33.5s
        
Some PyTensor flags:
    blas__ldflags= -framework Accelerate
    compiledir= /Users/uqamcka3/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.12.6-64
    floatX= float64
    device= cpu
Some OS information:
    sys.platform= darwin
    sys.version= 3.12.6 | packaged by conda-forge | (main, Sep 22 2024, 14:08:13) [Clang 17.0.6 ]
    sys.prefix= /opt/miniconda3/envs/pymc_env
Some environment variables:
    MKL_NUM_THREADS= None
    OMP_NUM_THREADS= None
    GOTO_NUM_THREADS= None

Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/numpy/__config__.py:155: UserWarning: Install `pyyaml` for better output
  warnings.warn("Install `pyyaml` for better output", stacklevel=1)
{
  "Compilers": {
    "c": {
      "name": "clang",
      "linker": "ld64",
      "version": "16.0.6",
      "commands": "x86_64-apple-darwin13.4.0-clang",
      "args": "-march=core2, -mtune=haswell, -mssse3, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /opt/miniconda3/envs/pymc_env/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225541074/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/opt/miniconda3/envs/pymc_env=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /opt/miniconda3/envs/pymc_env/include, -mmacosx-version-min=10.9",
      "linker args": "-Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/opt/miniconda3/envs/pymc_env/lib, -L/opt/miniconda3/envs/pymc_env/lib, -march=core2, -mtune=haswell, -mssse3, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /opt/miniconda3/envs/pymc_env/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225541074/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/opt/miniconda3/envs/pymc_env=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /opt/miniconda3/envs/pymc_env/include, -mmacosx-version-min=10.9"
    },
    "cython": {
      "name": "cython",
      "linker": "cython",
      "version": "3.0.8",
      "commands": "cython"
    },
    "c++": {
      "name": "clang",
      "linker": "ld64",
      "version": "16.0.6",
      "commands": "x86_64-apple-darwin13.4.0-clang++",
      "args": "-march=core2, -mtune=haswell, -mssse3, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /opt/miniconda3/envs/pymc_env/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225541074/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/opt/miniconda3/envs/pymc_env=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /opt/miniconda3/envs/pymc_env/include, -mmacosx-version-min=10.9",
      "linker args": "-Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/opt/miniconda3/envs/pymc_env/lib, -L/opt/miniconda3/envs/pymc_env/lib, -march=core2, -mtune=haswell, -mssse3, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /opt/miniconda3/envs/pymc_env/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225541074/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/opt/miniconda3/envs/pymc_env=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /opt/miniconda3/envs/pymc_env/include, -mmacosx-version-min=10.9"
    }
  },
  "Machine Information": {
    "host": {
      "cpu": "x86_64",
      "family": "x86_64",
      "endian": "little",
      "system": "darwin"
    },
    "build": {
      "cpu": "x86_64",
      "family": "x86_64",
      "endian": "little",
      "system": "darwin"
    }
  },
  "Build Dependencies": {
    "blas": {
      "name": "blas",
      "found": true,
      "version": "3.9.0",
      "detection method": "pkgconfig",
      "include directory": "/opt/miniconda3/envs/pymc_env/include",
      "lib directory": "/opt/miniconda3/envs/pymc_env/lib",
      "openblas configuration": "unknown",
      "pc file directory": "/opt/miniconda3/envs/pymc_env/lib/pkgconfig"
    },
    "lapack": {
      "name": "dep4461187856",
      "found": true,
      "version": "1.26.4",
      "detection method": "internal",
      "include directory": "unknown",
      "lib directory": "unknown",
      "openblas configuration": "unknown",
      "pc file directory": "unknown"
    }
  },
  "Python Information": {
    "path": "/opt/miniconda3/envs/pymc_env/bin/python",
    "version": "3.12"
  },
  "SIMD Extensions": {
    "baseline": [
      "SSE",
      "SSE2",
      "SSE3",
      "SSSE3"
    ],
    "found": [
      "SSE41",
      "POPCNT",
      "SSE42"
    ],
    "not found": [
      "AVX",
      "F16C",
      "FMA3",
      "AVX2",
      "AVX512F",
      "AVX512CD",
      "AVX512_KNL",
      "AVX512_SKX",
      "AVX512_CLX",
      "AVX512_CNL",
      "AVX512_ICL"
    ]
  }
}
Numpy dot module: numpy
Numpy location: /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/numpy/__init__.py
Numpy version: 1.26.4

You can find the C code in this temporary file: /var/folders/jr/hs4p74j97ql54jgy5hzw28nc0000gr/T/pytensor_compilation_error_rz82ylra
ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): node: ExpandDims{axes=[0, 1]}(0.8)
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/graph/rewriting/basic.py", line 1909, in process_node
    replacements = node_rewriter.transform(fgraph, node)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/graph/rewriting/basic.py", line 1081, in transform
    return self.fn(fgraph, node)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/tensor/rewriting/basic.py", line 1122, in constant_folding
    thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py", line 119, in make_thunk
    return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py", line 84, in make_c_thunk
    outputs = cl.make_thunk(
              ^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1182, in make_thunk
    cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
                                                             ^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1103, in __compile__
    thunk, module = self.cthunk_factory(
                    ^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1627, in cthunk_factory
    module = cache.module_from_key(key=key, lnk=self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/cmodule.py", line 1255, in module_from_key
    module = lnk.compile_cmodule(location)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1528, in compile_cmodule
    module = c_compiler.compile_str(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/cmodule.py", line 2654, in compile_str
    raise CompileError(
pytensor.link.c.exceptions.CompileError: Compilation failed (return status=1):
/opt/miniconda3/envs/pymc_env/bin/clang++ -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -L/opt/miniconda3/envs/pymc_env/lib -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -undefined dynamic_lookup -I/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/numpy/core/include -I/opt/miniconda3/envs/pymc_env/include/python3.12 -I/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/c_code -L/opt/miniconda3/envs/pymc_env/lib -fvisibility=hidden -o /Users/uqamcka3/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.12.6-64/tmpht9h46p1/mb782a9925f26f74c46a75d98e1484e89ff6c5c482e4b63d738d2bb93e667f8f6.so /Users/uqamcka3/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.12.6-64/tmpht9h46p1/mod.cpp
dyld[72915]: Symbol not found: __ZNK4tapi2v119LinkerInterfaceFile28getPlatformsAndMinDeploymentEv
  Referenced from: <E33DCAC4-3116-3019-8003-432FB3E66FB4> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld
  Expected in:     <9918D37F-F19F-30B9-B311-13829B79C3B0> /opt/miniconda3/envs/pymc_env/lib/libtapi.dylib
clang++: error: unable to execute command: Abort trap: 6
clang++: error: linker command failed due to signal (use -v to see invocation)

HINT: Use a linker other than the C linker to print the inputs' shapes and strides.
HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

In a previous section I added the line:

conda env config vars set PYTENSOR_FLAGS="blas__ldflags=-framework Accelerate"

to remove this WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions., but perhaps it is just depressing the warning as @drbenvincent was saying.

in testing his example I need to add an extra line to get the code to actually run:

import pymc as pm
import numpy as np
import pytensor
pytensor.config.gcc__cxxflags = '-L/opt/miniconda3/envs/pymc_env/lib -march=native'
pytensor.config.cxx = '/usr/bin/clang++'
# %%
with pm.Model():
    pm.Normal("x")
    pm.sample()

If I run the code above without either of the pytensor flags set I get a compile error:

ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): node: Cast{float64}(-0.5)
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/graph/rewriting/basic.py", line 1909, in process_node
    replacements = node_rewriter.transform(fgraph, node)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/graph/rewriting/basic.py", line 1081, in transform
    return self.fn(fgraph, node)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/tensor/rewriting/basic.py", line 1122, in constant_folding
    thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py", line 119, in make_thunk
    return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py", line 84, in make_c_thunk
    outputs = cl.make_thunk(
              ^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1182, in make_thunk
    cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
                                                             ^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1103, in __compile__
    thunk, module = self.cthunk_factory(
                    ^^^^^^^^^^^^^^^^^^^^
...
clang++: error: unable to execute command: Abort trap: 6
clang++: error: linker command failed due to signal (use -v to see invocation)


Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
You can find the C code in this temporary file: /var/folders/jr/hs4p74j97ql54jgy5hzw28nc0000gr/T/pytensor_compilation_error_i_5ebbxd

You can find the C code in this temporary file: /var/folders/jr/hs4p74j97ql54jgy5hzw28nc0000gr/T/pytensor_compilation_error_108aax7z



































































































---------------------------------------------------------------------------
CompileError                              Traceback (most recent call last)
File /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/vm.py:1227, in VMLinker.make_all(self, profiler, input_storage, output_storage, storage_map)
   1223 # no-recycling is done at each VM.__call__ So there is
   1224 # no need to cause duplicate c code by passing
   1225 # no_recycling here.
   1226 thunks.append(
-> 1227     node.op.make_thunk(node, storage_map, compute_map, [], impl=impl)
   1228 )
   1229 linker_make_thunk_time[node] = time.perf_counter() - thunk_start

File /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py:119, in COp.make_thunk(self, node, storage_map, compute_map, no_recycling, impl)
    118 try:
--> 119     return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
    120 except (NotImplementedError, MethodNotDefined):
    121     # We requested the c code, so don't catch the error.

File /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py:84, in COp.make_c_thunk(self, node, storage_map, compute_map, no_recycling)
     83         raise NotImplementedError("float16")
---> 84 outputs = cl.make_thunk(
     85     input_storage=node_input_storage, output_storage=node_output_storage
     86 )
     87 thunk, node_input_filters, node_output_filters = outputs

File /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py:1182, in CLinker.make_thunk(self, input_storage, output_storage, storage_map, cache, **kwargs)
...
Inputs types: [TensorType(float32, shape=()), TensorType(float64, shape=()), TensorType(float64, shape=()), TensorType(float32, shape=()), TensorType(float64, shape=())]

HINT: Use a linker other than the C linker to print the inputs' shapes and strides.
HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

Perhaps this is a problem with my xcode?

(pymc_env) uqamcka3@psy-qjlf9kt Random % xcode-select -v
xcode-select version 2408.
(pymc_env) uqamcka3@psy-qjlf9kt Random % clang++ -v
clang version 17.0.6
Target: x86_64-apple-darwin23.6.0
Thread model: posix
InstalledDir: /opt/miniconda3/envs/pymc_env/bin

Ideally it would be great to get the BAP3 environment working as this textbook would be great to learn/use.

EDIT: should’ve debugged a little more on my end, the solution was to add if __name__ == 'main': as a guard clause. However, this is interesting as before I was able to run it without needing this. If this should be deleted from the thread let me know.

Actually quick follow up.

It seems that my pymc 5.16.2 environment is not working anymore despite working in the last 10 minutes?

#%%
import pymc as pm
import numpy as np
import arviz as az
import pytensor
pytensor.config.gcc__cxxflags = '-L/opt/miniconda3/envs/pymc_env/lib -march=native'
pytensor.config.cxx = '/usr/bin/clang++'
# %%
with pm.Model():
    pm.Normal("x")
    pm.sample()

gives the following error:

(pymc_env) (base) uqamcka3@x86_64-apple-darwin13 Random % /opt/miniconda3/envs/pymc_env/bin/python /Users/uqamcka3/PHD/Projects/Random/mrp.p
y
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [x]
Sampling 4 chains, 0 divergences ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- / 0:00:02Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Auto-assigning NUTS sampler...
Sampling 4 chains, 0 divergences ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- / 0:00:02Initializing NUTS using jitter+adapt_diag...
Sampling 4 chains, 0 divergences ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- / 0:00:02Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [x]
Traceback (most recent call last):
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 131, in _main
    prepare(preparation_data)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 287, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/uqamcka3/PHD/Projects/Random/mrp.py", line 11, in <module>
    pm.sample()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 846, in sample
    _mp_sample(**sample_args, **parallel_args)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 1243, in _mp_sample
    sampler = ps.ParallelSampler(
              ^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 413, in __init__
    ProcessAdapter(
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 267, in __init__
    self._process.start()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/context.py", line 301, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html
        
Sampling 4 chains, 0 divergences ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- / 0:00:02
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [x]
Traceback (most recent call last):
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 131, in _main
    prepare(preparation_data)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 287, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/uqamcka3/PHD/Projects/Random/mrp.py", line 11, in <module>
    pm.sample()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 846, in sample
    _mp_sample(**sample_args, **parallel_args)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 1243, in _mp_sample
    sampler = ps.ParallelSampler(
              ^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 413, in __init__
    ProcessAdapter(
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 267, in __init__
    self._process.start()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/context.py", line 301, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html
        
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [x]
Traceback (most recent call last):
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 131, in _main
    prepare(preparation_data)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 287, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/uqamcka3/PHD/Projects/Random/mrp.py", line 11, in <module>
    pm.sample()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 846, in sample
    _mp_sample(**sample_args, **parallel_args)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 1243, in _mp_sample
    sampler = ps.ParallelSampler(
              ^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 413, in __init__
    ProcessAdapter(
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 267, in __init__
    self._process.start()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/context.py", line 301, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html
        
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [x]
Traceback (most recent call last):
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 131, in _main
    prepare(preparation_data)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 287, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/uqamcka3/PHD/Projects/Random/mrp.py", line 11, in <module>
    pm.sample()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 846, in sample
    _mp_sample(**sample_args, **parallel_args)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 1243, in _mp_sample
    sampler = ps.ParallelSampler(
              ^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 413, in __init__
    ProcessAdapter(
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 267, in __init__
    self._process.start()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/context.py", line 301, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html
        
Traceback (most recent call last):
  File "/Users/uqamcka3/PHD/Projects/Random/mrp.py", line 11, in <module>
    pm.sample()
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 846, in sample
    _mp_sample(**sample_args, **parallel_args)
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/mcmc.py", line 1259, in _mp_sample
    for draw in sampler:
                ^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 471, in __iter__
    draw = ProcessAdapter.recv_draw(self._active)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/parallel.py", line 328, in recv_draw
    msg = ready[0].recv()
          ^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
          ^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/connection.py", line 430, in _recv_bytes
    buf = self._recv(4)
          ^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/pymc_env/lib/python3.12/multiprocessing/connection.py", line 399, in _recv
    raise EOFError
EOFError

Continuing on from my Bayesian Analysis in Python 3 (BAP3) comments. It seems the main issue I was having was jupyter not unpickling models when multiprocessing. For my solutions I always set the following pytensor.config settings:

pytensor.config.gcc__cxxflags = '-L/opt/miniconda3/envs/bap3/lib -O3 -march=native'
pytensor.config.cxx = '/usr/bin/clang++'
pytensor.config.blas__ldflags = '-framework Accelerate'

For those in the same position as me there are two (or more solutions):

  1. set cores=1 to force no multiprocessing or
  2. use a set the multiprocessing backend to fork i.e. pm.sample(1000, cores=4, chains=4, mp_ctx="fork"
  3. rewrite the .ipynbs into scripts and run the models using the if __name__ == '__main__': guard clause.

If this is helpful to anyone that would be great either in terms of figuring out what is happening or getting it to work for them when going through the book (if this proves to no longer work I will edit this comment accordingly).

Just FYI below this the full (rather a lot!) output you requested for an environment / project that I’m currently working on, maybe helpful?

FYI I use mamba, part of miniforge GitHub - conda-forge/miniforge: A conda-forge distribution. to handle environments, although generally install via pip and makefiles. It’s a fairly arcane process these days but I can go into detail if you want :smiley:


        Some results that you can compare against. They were 10 executions
        of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000).
        All memory layout was in C order.

        CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB),
                    Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
                    Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB),
                    Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?)
                    Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled),
                    Core i7 950(3.07GHz, hyper-threads enabled)
                    Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)


        Libraries tested:
            * numpy with ATLAS from distribution (FC9) package (1 thread)
            * manually compiled numpy and ATLAS with 2 threads
            * goto 1.26 with 1, 2, 4 and 8 threads
            * goto2 1.13 compiled with multiple threads enabled

                          Xeon   Xeon   Xeon  Core2 i7    i7     Xeon   Xeon
        lib/nb threads    E5345  E5430  E5450 E8500 930   950    X5560  X5550

        numpy 1.3.0 blas                                                775.92s
        numpy_FC9_atlas/1 39.2s  35.0s  30.7s 29.6s 21.5s 19.60s
        goto/1            18.7s  16.1s  14.2s 13.7s 16.1s 14.67s
        numpy_MAN_atlas/2 12.0s  11.6s  10.2s  9.2s  9.0s
        goto/2             9.5s   8.1s   7.1s  7.3s  8.1s  7.4s
        goto/4             4.9s   4.4s   3.7s  -     4.1s  3.8s
        goto/8             2.7s   2.4s   2.0s  -     4.1s  3.8s
        openblas/1                                        14.04s
        openblas/2                                         7.16s
        openblas/4                                         3.71s
        openblas/8                                         3.70s
        mkl 11.0.083/1            7.97s
        mkl 10.2.2.025/1                                         13.7s
        mkl 10.2.2.025/2                                          7.6s
        mkl 10.2.2.025/4                                          4.0s
        mkl 10.2.2.025/8                                          2.0s
        goto2 1.13/1                                                     14.37s
        goto2 1.13/2                                                      7.26s
        goto2 1.13/4                                                      3.70s
        goto2 1.13/8                                                      1.94s
        goto2 1.13/16                                                     3.16s

        Test time in float32. There were 10 executions of gemm in
        float32 with matrices of shape 5000x5000 (M=N=K=5000)
        All memory layout was in C order.


        cuda version      8.0    7.5    7.0
        gpu
        M40               0.45s  0.47s
        k80               0.92s  0.96s
        K6000/NOECC       0.71s         0.69s
        P6000/NOECC       0.25s

        Titan X (Pascal)  0.28s
        GTX Titan X       0.45s  0.45s  0.47s
        GTX Titan Black   0.66s  0.64s  0.64s
        GTX 1080          0.35s
        GTX 980 Ti               0.41s
        GTX 970                  0.66s
        GTX 680                         1.57s
        GTX 750 Ti               2.01s  2.01s
        GTX 750                  2.46s  2.37s
        GTX 660                  2.32s  2.32s
        GTX 580                  2.42s
        GTX 480                  2.87s
        TX1                             7.6s (float32 storage and computation)
        GT 610                          33.5s
        
Some PyTensor flags:
    blas__ldflags= -L/Users/jon/miniforge/envs/oreum_survival/lib -llapack -lblas -lcblas -lm -Wl,-rpath,/Users/jon/miniforge/envs/oreum_survival/lib
    compiledir= /Users/jon/.pytensor/compiledir_macOS-14.7-arm64-arm-64bit-arm-3.11.9-64
    floatX= float64
    device= cpu
Some OS information:
    sys.platform= darwin
    sys.version= 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
    sys.prefix= /Users/jon/miniforge/envs/oreum_survival
Some environment variables:
    MKL_NUM_THREADS= None
    OMP_NUM_THREADS= None
    GOTO_NUM_THREADS= None

Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /Users/jon/miniforge/envs/oreum_survival/include
    lib directory: /Users/jon/miniforge/envs/oreum_survival/lib
    name: blas
    openblas configuration: unknown
    pc file directory: /Users/jon/miniforge/envs/oreum_survival/lib/pkgconfig
    version: 3.9.0
  lapack:
    detection method: internal
    found: true
    include directory: unknown
    lib directory: unknown
    name: dep4377784592
    openblas configuration: unknown
    pc file directory: unknown
    version: 1.26.4
Compilers:
  c:
    args: -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem,
      /Users/jon/miniforge/envs/oreum_survival/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225640867/work=/usr/local/src/conda/numpy-1.26.4,
      -fdebug-prefix-map=/Users/jon/miniforge/envs/oreum_survival=/usr/local/src/conda-prefix,
      -D_FORTIFY_SOURCE=2, -isystem, /Users/jon/miniforge/envs/oreum_survival/include,
      -mmacosx-version-min=11.0
    commands: arm64-apple-darwin20.0.0-clang
    linker: ld64
    linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/jon/miniforge/envs/oreum_survival/lib,
      -L/Users/jon/miniforge/envs/oreum_survival/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong,
      -O2, -pipe, -isystem, /Users/jon/miniforge/envs/oreum_survival/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225640867/work=/usr/local/src/conda/numpy-1.26.4,
      -fdebug-prefix-map=/Users/jon/miniforge/envs/oreum_survival=/usr/local/src/conda-prefix,
      -D_FORTIFY_SOURCE=2, -isystem, /Users/jon/miniforge/envs/oreum_survival/include,
      -mmacosx-version-min=11.0
    name: clang
    version: 16.0.6
  c++:
    args: -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++,
      -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/jon/miniforge/envs/oreum_survival/include,
      -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225640867/work=/usr/local/src/conda/numpy-1.26.4,
      -fdebug-prefix-map=/Users/jon/miniforge/envs/oreum_survival=/usr/local/src/conda-prefix,
      -D_FORTIFY_SOURCE=2, -isystem, /Users/jon/miniforge/envs/oreum_survival/include,
      -mmacosx-version-min=11.0
    commands: arm64-apple-darwin20.0.0-clang++
    linker: ld64
    linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/jon/miniforge/envs/oreum_survival/lib,
      -L/Users/jon/miniforge/envs/oreum_survival/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong,
      -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0,
      -isystem, /Users/jon/miniforge/envs/oreum_survival/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225640867/work=/usr/local/src/conda/numpy-1.26.4,
      -fdebug-prefix-map=/Users/jon/miniforge/envs/oreum_survival=/usr/local/src/conda-prefix,
      -D_FORTIFY_SOURCE=2, -isystem, /Users/jon/miniforge/envs/oreum_survival/include,
      -mmacosx-version-min=11.0
    name: clang
    version: 16.0.6
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 3.0.8
Machine Information:
  build:
    cpu: aarch64
    endian: little
    family: aarch64
    system: darwin
  cross-compiled: true
  host:
    cpu: arm64
    endian: little
    family: aarch64
    system: darwin
Python Information:
  path: /Users/jon/miniforge/envs/oreum_survival/bin/python
  version: '3.11'
SIMD Extensions:
  baseline:
  - NEON
  - NEON_FP16
  - NEON_VFPV4
  - ASIMD
  found:
  - ASIMDHP
  not found:
  - ASIMDFHM

Numpy dot module: numpy
Numpy location: /Users/jon/miniforge/envs/oreum_survival/lib/python3.11/site-packages/numpy/__init__.py
Numpy version: 1.26.4

We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).

Total execution time: 7.96s on CPU (with direct PyTensor binding to blas).

Try to run this script a few times. Experience shows that the first time is not as fast as following calls. The difference is not big, but consistent.