@atpfm ML scientist here: The Neural Engine has always been fp16 only. When you can use it, it delivers incredible performance at very low power needs. However, there‘s no public API to program it. You are restricted to using CoreML, which cannot be used to develop and train models directly (You need PyTorch etc). However, model conversion fails more often than not, because CoreML lacks support for many layer ops. I have high hopes for MLX.
@MaxLaves So what do you think about the M4 neural engine’s supposed 38 trillion operations per second compared to 18 trillion in the M3? Does that seem plausible without some shenanigans in how those numbers are determined?
@siracusa The Geekbench CoreML NE scores for M4 iPad are only incrementally better than M3 https://www.tomshardware.com/pc-components/cpus/alleged-apple-m4-geekbench-scores-show-incremental-improvement-in-machine-learning-over-last-gen
According to the article, Apple reported 38 TOPS for INT8 inference, which is often used to accelerate LLMs (vs 18 TOPS fp16 for M3).
@siracusa Maybe the new thing is support for INT8, whereas old ANE could only do fp16. I don’t know if this is a hardware or software change.
@siracusa Ok, INT8 acceleration was introduced in A17 Pro: “In newer hardware, e.g. iPhone 15 pro (A17 pro), there is increased int8-int8 compute available on NE, compared to previous versions.”
https://apple.github.io/coremltools/docs-guides/source/quantization-overview.html
@siracusa According to Wikipedia, Apple already reported 35 TOPS for the A17 Pro NE. I don’t remember this making big news when it was introduced last year.
https://en.wikipedia.org/wiki/Apple_A17#Comparison_of_A15,_A16_and_A17