I’m working on a macbook air m2 Sonoma 14.6.1.
I’ve been playing around with my environments more as I ran into an issue. after updating to xcode 16, it seems one of my two environments the bap3
environment that was being used was reporting clang++
errors, something like only clang 16>=
is supported I am unsure. the environment can be found here (BAP3/bap3.yml at main · aloctavodia/BAP3 · GitHub).
In doing this I tried messing around with my pymc 5.16.2
environment, somehow breaking it and needing to reinstall. After reinstalling I ran python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")
, with the following traceback, it seems it is not using accelerate (there are errors).
(pymc_env) uqamcka3@psy-qjlf9kt Random % python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")
Some results that you can compare against. They were 10 executions
of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000).
All memory layout was in C order.
CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB),
Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?)
Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled),
Core i7 950(3.07GHz, hyper-threads enabled)
Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)
Libraries tested:
* numpy with ATLAS from distribution (FC9) package (1 thread)
* manually compiled numpy and ATLAS with 2 threads
* goto 1.26 with 1, 2, 4 and 8 threads
* goto2 1.13 compiled with multiple threads enabled
Xeon Xeon Xeon Core2 i7 i7 Xeon Xeon
lib/nb threads E5345 E5430 E5450 E8500 930 950 X5560 X5550
numpy 1.3.0 blas 775.92s
numpy_FC9_atlas/1 39.2s 35.0s 30.7s 29.6s 21.5s 19.60s
goto/1 18.7s 16.1s 14.2s 13.7s 16.1s 14.67s
numpy_MAN_atlas/2 12.0s 11.6s 10.2s 9.2s 9.0s
goto/2 9.5s 8.1s 7.1s 7.3s 8.1s 7.4s
goto/4 4.9s 4.4s 3.7s - 4.1s 3.8s
goto/8 2.7s 2.4s 2.0s - 4.1s 3.8s
openblas/1 14.04s
openblas/2 7.16s
openblas/4 3.71s
openblas/8 3.70s
mkl 11.0.083/1 7.97s
mkl 10.2.2.025/1 13.7s
mkl 10.2.2.025/2 7.6s
mkl 10.2.2.025/4 4.0s
mkl 10.2.2.025/8 2.0s
goto2 1.13/1 14.37s
goto2 1.13/2 7.26s
goto2 1.13/4 3.70s
goto2 1.13/8 1.94s
goto2 1.13/16 3.16s
Test time in float32. There were 10 executions of gemm in
float32 with matrices of shape 5000x5000 (M=N=K=5000)
All memory layout was in C order.
cuda version 8.0 7.5 7.0
gpu
M40 0.45s 0.47s
k80 0.92s 0.96s
K6000/NOECC 0.71s 0.69s
P6000/NOECC 0.25s
Titan X (Pascal) 0.28s
GTX Titan X 0.45s 0.45s 0.47s
GTX Titan Black 0.66s 0.64s 0.64s
GTX 1080 0.35s
GTX 980 Ti 0.41s
GTX 970 0.66s
GTX 680 1.57s
GTX 750 Ti 2.01s 2.01s
GTX 750 2.46s 2.37s
GTX 660 2.32s 2.32s
GTX 580 2.42s
GTX 480 2.87s
TX1 7.6s (float32 storage and computation)
GT 610 33.5s
Some PyTensor flags:
blas__ldflags= -framework Accelerate
compiledir= /Users/uqamcka3/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.12.6-64
floatX= float64
device= cpu
Some OS information:
sys.platform= darwin
sys.version= 3.12.6 | packaged by conda-forge | (main, Sep 22 2024, 14:08:13) [Clang 17.0.6 ]
sys.prefix= /opt/miniconda3/envs/pymc_env
Some environment variables:
MKL_NUM_THREADS= None
OMP_NUM_THREADS= None
GOTO_NUM_THREADS= None
Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/numpy/__config__.py:155: UserWarning: Install `pyyaml` for better output
warnings.warn("Install `pyyaml` for better output", stacklevel=1)
{
"Compilers": {
"c": {
"name": "clang",
"linker": "ld64",
"version": "16.0.6",
"commands": "x86_64-apple-darwin13.4.0-clang",
"args": "-march=core2, -mtune=haswell, -mssse3, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /opt/miniconda3/envs/pymc_env/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225541074/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/opt/miniconda3/envs/pymc_env=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /opt/miniconda3/envs/pymc_env/include, -mmacosx-version-min=10.9",
"linker args": "-Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/opt/miniconda3/envs/pymc_env/lib, -L/opt/miniconda3/envs/pymc_env/lib, -march=core2, -mtune=haswell, -mssse3, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /opt/miniconda3/envs/pymc_env/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225541074/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/opt/miniconda3/envs/pymc_env=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /opt/miniconda3/envs/pymc_env/include, -mmacosx-version-min=10.9"
},
"cython": {
"name": "cython",
"linker": "cython",
"version": "3.0.8",
"commands": "cython"
},
"c++": {
"name": "clang",
"linker": "ld64",
"version": "16.0.6",
"commands": "x86_64-apple-darwin13.4.0-clang++",
"args": "-march=core2, -mtune=haswell, -mssse3, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /opt/miniconda3/envs/pymc_env/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225541074/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/opt/miniconda3/envs/pymc_env=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /opt/miniconda3/envs/pymc_env/include, -mmacosx-version-min=10.9",
"linker args": "-Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/opt/miniconda3/envs/pymc_env/lib, -L/opt/miniconda3/envs/pymc_env/lib, -march=core2, -mtune=haswell, -mssse3, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /opt/miniconda3/envs/pymc_env/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225541074/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/opt/miniconda3/envs/pymc_env=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /opt/miniconda3/envs/pymc_env/include, -mmacosx-version-min=10.9"
}
},
"Machine Information": {
"host": {
"cpu": "x86_64",
"family": "x86_64",
"endian": "little",
"system": "darwin"
},
"build": {
"cpu": "x86_64",
"family": "x86_64",
"endian": "little",
"system": "darwin"
}
},
"Build Dependencies": {
"blas": {
"name": "blas",
"found": true,
"version": "3.9.0",
"detection method": "pkgconfig",
"include directory": "/opt/miniconda3/envs/pymc_env/include",
"lib directory": "/opt/miniconda3/envs/pymc_env/lib",
"openblas configuration": "unknown",
"pc file directory": "/opt/miniconda3/envs/pymc_env/lib/pkgconfig"
},
"lapack": {
"name": "dep4461187856",
"found": true,
"version": "1.26.4",
"detection method": "internal",
"include directory": "unknown",
"lib directory": "unknown",
"openblas configuration": "unknown",
"pc file directory": "unknown"
}
},
"Python Information": {
"path": "/opt/miniconda3/envs/pymc_env/bin/python",
"version": "3.12"
},
"SIMD Extensions": {
"baseline": [
"SSE",
"SSE2",
"SSE3",
"SSSE3"
],
"found": [
"SSE41",
"POPCNT",
"SSE42"
],
"not found": [
"AVX",
"F16C",
"FMA3",
"AVX2",
"AVX512F",
"AVX512CD",
"AVX512_KNL",
"AVX512_SKX",
"AVX512_CLX",
"AVX512_CNL",
"AVX512_ICL"
]
}
}
Numpy dot module: numpy
Numpy location: /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/numpy/__init__.py
Numpy version: 1.26.4
You can find the C code in this temporary file: /var/folders/jr/hs4p74j97ql54jgy5hzw28nc0000gr/T/pytensor_compilation_error_rz82ylra
ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): node: ExpandDims{axes=[0, 1]}(0.8)
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/graph/rewriting/basic.py", line 1909, in process_node
replacements = node_rewriter.transform(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/graph/rewriting/basic.py", line 1081, in transform
return self.fn(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/tensor/rewriting/basic.py", line 1122, in constant_folding
thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py", line 119, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py", line 84, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1182, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1103, in __compile__
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1627, in cthunk_factory
module = cache.module_from_key(key=key, lnk=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/cmodule.py", line 1255, in module_from_key
module = lnk.compile_cmodule(location)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1528, in compile_cmodule
module = c_compiler.compile_str(
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/cmodule.py", line 2654, in compile_str
raise CompileError(
pytensor.link.c.exceptions.CompileError: Compilation failed (return status=1):
/opt/miniconda3/envs/pymc_env/bin/clang++ -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -L/opt/miniconda3/envs/pymc_env/lib -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -undefined dynamic_lookup -I/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/numpy/core/include -I/opt/miniconda3/envs/pymc_env/include/python3.12 -I/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/c_code -L/opt/miniconda3/envs/pymc_env/lib -fvisibility=hidden -o /Users/uqamcka3/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.12.6-64/tmpht9h46p1/mb782a9925f26f74c46a75d98e1484e89ff6c5c482e4b63d738d2bb93e667f8f6.so /Users/uqamcka3/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.12.6-64/tmpht9h46p1/mod.cpp
dyld[72915]: Symbol not found: __ZNK4tapi2v119LinkerInterfaceFile28getPlatformsAndMinDeploymentEv
Referenced from: <E33DCAC4-3116-3019-8003-432FB3E66FB4> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld
Expected in: <9918D37F-F19F-30B9-B311-13829B79C3B0> /opt/miniconda3/envs/pymc_env/lib/libtapi.dylib
clang++: error: unable to execute command: Abort trap: 6
clang++: error: linker command failed due to signal (use -v to see invocation)
HINT: Use a linker other than the C linker to print the inputs' shapes and strides.
HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.
In a previous section I added the line:
conda env config vars set PYTENSOR_FLAGS="blas__ldflags=-framework Accelerate"
to remove this WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
, but perhaps it is just depressing the warning as @drbenvincent was saying.
in testing his example I need to add an extra line to get the code to actually run:
import pymc as pm
import numpy as np
import pytensor
pytensor.config.gcc__cxxflags = '-L/opt/miniconda3/envs/pymc_env/lib -march=native'
pytensor.config.cxx = '/usr/bin/clang++'
# %%
with pm.Model():
pm.Normal("x")
pm.sample()
If I run the code above without either of the pytensor
flags set I get a compile error:
ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): node: Cast{float64}(-0.5)
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/graph/rewriting/basic.py", line 1909, in process_node
replacements = node_rewriter.transform(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/graph/rewriting/basic.py", line 1081, in transform
return self.fn(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/tensor/rewriting/basic.py", line 1122, in constant_folding
thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py", line 119, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py", line 84, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1182, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1103, in __compile__
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
...
clang++: error: unable to execute command: Abort trap: 6
clang++: error: linker command failed due to signal (use -v to see invocation)
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
You can find the C code in this temporary file: /var/folders/jr/hs4p74j97ql54jgy5hzw28nc0000gr/T/pytensor_compilation_error_i_5ebbxd
You can find the C code in this temporary file: /var/folders/jr/hs4p74j97ql54jgy5hzw28nc0000gr/T/pytensor_compilation_error_108aax7z
---------------------------------------------------------------------------
CompileError Traceback (most recent call last)
File /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/vm.py:1227, in VMLinker.make_all(self, profiler, input_storage, output_storage, storage_map)
1223 # no-recycling is done at each VM.__call__ So there is
1224 # no need to cause duplicate c code by passing
1225 # no_recycling here.
1226 thunks.append(
-> 1227 node.op.make_thunk(node, storage_map, compute_map, [], impl=impl)
1228 )
1229 linker_make_thunk_time[node] = time.perf_counter() - thunk_start
File /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py:119, in COp.make_thunk(self, node, storage_map, compute_map, no_recycling, impl)
118 try:
--> 119 return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
120 except (NotImplementedError, MethodNotDefined):
121 # We requested the c code, so don't catch the error.
File /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/op.py:84, in COp.make_c_thunk(self, node, storage_map, compute_map, no_recycling)
83 raise NotImplementedError("float16")
---> 84 outputs = cl.make_thunk(
85 input_storage=node_input_storage, output_storage=node_output_storage
86 )
87 thunk, node_input_filters, node_output_filters = outputs
File /opt/miniconda3/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/c/basic.py:1182, in CLinker.make_thunk(self, input_storage, output_storage, storage_map, cache, **kwargs)
...
Inputs types: [TensorType(float32, shape=()), TensorType(float64, shape=()), TensorType(float64, shape=()), TensorType(float32, shape=()), TensorType(float64, shape=())]
HINT: Use a linker other than the C linker to print the inputs' shapes and strides.
HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.
Perhaps this is a problem with my xcode?
(pymc_env) uqamcka3@psy-qjlf9kt Random % xcode-select -v
xcode-select version 2408.
(pymc_env) uqamcka3@psy-qjlf9kt Random % clang++ -v
clang version 17.0.6
Target: x86_64-apple-darwin23.6.0
Thread model: posix
InstalledDir: /opt/miniconda3/envs/pymc_env/bin
Ideally it would be great to get the BAP3 environment working as this textbook would be great to learn/use.