Does it use Metal on Mac OS (Apple Silicon)? And if not, how does it compare per...

simonw · on Nov 30, 2023

It uses the GPU on my M2 Mac - I can see it making use of that in the Activity Monitor GPU panel.

jart · on Nov 30, 2023

Correct. Apple Silicon GPU performance should be equally fast in llamafile as it is in llama.cpp. Where llamafile is currently behind is at CPU inference (only on Apple Silicon specifically) which is currently going ~22% slower compared to a native build of llama.cpp. I suspect it's due to either (1) I haven't implemented support for Apple Accelerate yet, or (2) our GCC -march=armv8a toolchain isn't as good at optimizing ggml-quant.c as Xcode clang -march=native is. I hope it's an issue we can figure out soon!

boywitharupee · on Nov 30, 2023

currently, on apple silicon "GPU" <> "Metal" are synonymous.

yes, there are other apis (opengl,opencl) to access the gpu but they're all deprecated.

technically, yes, this is using Metal.