Hacker Newsnew | past | comments | ask | show | jobs | submit | Tangokat's commentslogin

"Scaling up performance from M5 and offering the same breakthrough GPU architecture with a Neural Accelerator in each core, M5 Pro and M5 Max deliver up to 4x faster LLM prompt processing than M4 Pro and M4 Max, and up to 8x AI image generation than M1 Pro and M1 Max."

Are they doubling down on local LLMs then?

I still think Apple has a huge opportunity in privacy first LLMs but so far I'm not seeing much execution. Wondering if that will change with the overhaul of Siri this spring.


I think its just marketing, and the marketing is working. Look how many people bought Minis and ended up just paying for API calls anyway. (Saw it IRL 2x, see it on reddit openclaw daily)

I don't mind it, I open Apple stock. But I'm def not buying into their rebranding of integrated GPU under the guise of Unified Memory.


> Look how many people bought Minis and ended up just paying for API calls anyway. (Saw it IRL 2x, see it on reddit openclaw daily)

Aren't the OpenClaw enjoyers buying Mac Minis because it's the cheapest thing which runs macOS, the only platform which can programmatically interface with iMessage and other Apple ecosystem stuff? It has nothing to do with the hardware really.

Still, buying a brand new Mac Mini for that purpose seems kind of pointless when a used M1 model would achieve the same thing.


It’s exactly that. They are buying the base model just for that. You are not going to do much local AI with those 16GB of ram anyway, it could be useful for small things but the main purpose of the Mini is being able to interact with the apple apps and services.


16GB should be enough for TTS/Voice models running locally no ? I was thinking about having a home assistant setup like that where the voice is local and the brain is API based


I run ministral for my home knowledge database on 24G iMac and some other non-agentic LLM things.


Sure, that’s why I said maybe it’s useful for a few things. But the main reason people were recommending the Mini was for its price (base model) and having access to the Apple services for clawdbot to leverage. Not precisely for local AI.


No one is buying a base model Mac for local LLM. Everyone is forgetting the PC prices have drastically increased due to RAM and SSD. Meanwhile, Macs had no such price change… at least for the models that didn’t just drop today. Mac’s are just a good deal at the moment.


> Meanwhile, Macs had no such price change

Yeah because Mac upgrade prices were already sky high, long before the component shortage. 32GB of DDR5-6000 for a PC rocketed from $100 to $500, while the cost of adding 16GB to a Mac was and still is $400.


I'm kind of curious how Apple's supply contracts actually work, because it's currently more attractive to buy a Mac with a lot of RAM than it usually is, relative to a PC. So if it's "we negotiated a price and you give us as much RAM as we sell machines" the company supplying the RAM is getting soaked because they're having to supply even more RAM to Apple for a below-market price.

But if the contract was for a specific amount of RAM and then people start coming to Apple more for high RAM machines, they're going to exhaust their contract sooner than usual and run out of cheap memory to buy. Then they have to decide if they want to lower their margins or raise the already-high price up to nosebleed levels.


https://www.linkedin.com/pulse/memory-supply-chain-ai-disrup...

  Apple has accepted a 100% price increase for Samsung's LPDDR5X memory, with DRAM supply commitments secured only through the first half of 2026. Tim Cook acknowledged during the Q1 FY2026 earnings call that storage price increases would significantly impact Q2 gross margins.. Apple is evaluating ChangXin Memory Technologies (CXMT) and Yangtze Memory Technologies (YMTC) as new supply sources, attempting to rebuild pricing leverage through supply chain diversification.


the new models cost $200 more for each 8GB of Ram you add.. Ouch...


That's been the case for years. Not new to the M5's


> Aren't the OpenClaw enjoyers buying Mac Minis because it's the cheapest thing which runs macOS

That's likely only part of the reason. Mac Mini is now "cheap" because everyone exploded in price. RAM and SSD etc have all gone up massively. Not the mention Mac mini is easy out of the box experience.


It's not cheap, though. Two weeks ago I bought a computer with a similar form factor (GMKtec G10). Worse CPU and GPU but same 16GB memory and a larger SSD for 40% the price of a base mac mini ($239 vs $599). It came with Windows preinstalled, but I immediately wiped that to install linux. Even a used (M-series) mac mini is substantially more expensive. It will cost me about an extra penny per day in electricity costs over a mac mini, but I won't be alive long enough for the mac mini to catch up on that metric.

I considered the mac mini at the time, but the mac mini only makes sense if you need the local processing power or the apple ecosystem integration. It's certainly not cheaper if you just need a small box to make API calls and do minimal local processing.


It's cheap for what you get.

If you just need "a small box to make API calls and do minimal local processing" you an also just buy a RPI for a fraction of the price of the GMKtec G10.

All 3 serve a different purpose; just because you can buy a slower machine for less doesn't mean the price:performance of the M1 Mac Mini changes.


> you an also just buy a RPI for a fraction of the price of the GMKtec G10.

Sadly not really. The Pi 5 8gb canakit starter set, which feels like a more true price since it's including power supply, MicroSD card, and case, is now $210. The pi5 8gb by itself is $135.

A 16gb pi5 kit, to match just the RAM capacity to say nothing of the difference in storage {size, speed, quality} and networking, is then also an eye watering $300


>Sadly not really. The Pi 5 8gb canakit starter set, which feels like a more true price since it's including power supply, MicroSD card, and case, is now $210. The pi5 8gb by itself is $135.

At that point buy a used macbook air m1.


>you an also just buy a RPI for a fraction of the price

lol. you need to look at rpi 5 prices again. they are insane.


If you need the CPU power in the Mac Mini then it is a pretty good price-to-performance ratio.


> It came with Windows preinstalled, but I immediately wiped that to install linux.

Do you really need Openclaw now? And not claude code + zapier or Claude code + cron?

That's the point. If you have worse CPU and GPU Windows will be sluggish (it's bloated).


That’s a big “if” at the end. You can always make a computer cheaper “if” you strip down what you need to do with it.

The Mac mini strangely is and has been a very good deal for years now.


There are so few used Mac Mini around, those are all gone and what is left is to buy new.


Worse than that, they hold their value, so buying a used M1 mini is still a few hundred bucks, and saving $200-300 by purchasing a 5 generation older mini seems like a bad deal in comparison.


Someone came to be excited they got a "deal" on the newest Intel Mac Mini for hosting OpenClaw. 8GB model for $300. I kind of regret bursting their bubble by telling them you can walk over to Costco (nearest one at time of discussion was walking distance) and pay $550 for one with an M4 and 16GB of RAM.


Up until a week ago, the base m4 mini (16gb ram/256gb ssd) was $399 at microcenter, now $499. Pretty shocking how good of a value that is IMO.


Damn. Would be awesome to network a bunch over thunderbolt.


That’s just somebody not doing their research and overpaying unfortunately


Just like with GPUs and Bitcoin they'll be a flood of old hardware on the market eventually.


Depends on what you mean by “eventually”


Bro. The used M1 mini and studio are all gone. I was thinking of buying one for local AI before openclaw came out and went back to look and the order book is near empty. Swappa is cleared out. eBay is to the point that the m1 studio is selling for at least a thousand more.

This arb you’re talking about doesn’t exist. An m1 studio with 64 gb was $1300 prior to openclaw. You’re not getting that today.

I would have preferred that too since I could Asahi it later. It’s just not cheap any more. The m4 is flat $500 at microcenter.


Can't they simply run MacOS on a VM on existing Mac hardware?


Not if you want it to be able to use the hardware identifiers to register for use with iMessage.



I have it running in a macos VM using lume & BlueBubbles on a throwaway iCloud account. A lot of hoops to jump through, though

https://cua.ai/docs/lume https://docs.openclaw.ai/channels/bluebubbles


You aren’t going to run a network connected 24/7 online agent from a laptop because it’s battery powered and portable.


yes, and its funny that all these critical people dont know this


Why not? The integrated GPUs are quite powerful, and having access to 32+ GB of GPU memory is amazing. There's a reason people buy Macs for local LLM work. Nothing else on the market really beats it right now.


My M4 MacBook Pro for work just came a few weeks ago with 128 GB of RAM. Some simple voice customization started using 90GB. The unified memory value is there.


Jeff Geerling had a video of using 4 Mac Studios each with 512GB RAM connected by Thunderbolt. Each machine is around $10K so this isn't cheap but the performance is impressive.

https://www.youtube.com/watch?v=x4_RsUxRjKU


If 40k is the barrier to entry for impressive, that doesn't really sell the usecase of local LLMs very well.

For the same price in API calls, you could fund AI driven development across a small team for quite a long while.

Whether that remains the case once those models are no longer subsidized, TBD. But as of today the comparison isn't even close.


It’s what a small business might have paid for an onprem web server a couple of decades ago before clouds caught on. I figure if a legal or medical practice saw value in LLMs it wouldn’t be a big deal to shove 50k into a closet


You would still have to do some pretty outstanding volume before that makes sense over choosing the "Enterprise" plan from OpenAI or Anthropic if data retention is the motivation.

Assuming, of course, that your legal team signs off on their assurance not to train on or store your data with said Enterprise plans.


At least with the server you know what you are buying.

With Anthropic you're paying for "more tokens than the free plan" which has no meaning


With M3 Max with 64GB of unified ram you can code with a local LLM, so the bar is much lower


But why? Spending several thousand dollars to run sub-par models when the break-even point could still be years away seems bizarre for any real usecase where your goal is productivity over novelty. Anyone who has used Codex or Opus can attest that the difference between those and a locally available model like Qwen or Codestral is night and day.

To be clear, I totally get the idea of running local LLMs for toy reasons. But in a business context the sell on a stack of Mac Pros seems misguided at best.


I ran the qwen 3.5 35b a3b q4 model locally on a ryzen server with 64k context window and 5-8 tokens a second.

It is the first local model I've tried which could reason properly. Similar to Gemini 2.5 or sonnet 3.5. I gave it some tools to call , asked claude to order it around, (download quotes, print charts, set up a gnome extension) even claude was sort of impressed that it could get the job done.

Point is, it is really close. It isn't opus 4.5 yet, but very promising given the size. Local is definitely getting there and even without GPUs.

But you're right, I see no reason to spend right now.


Getting Opus to call something local sounds interesting, since that's more or less what it's doing with Sonnet anyway if you're using Claude Code. How are you getting it to call out to local models? Skills? Or paying the API costs and using Pi?


I just start llama.cpp serve with the gguf which creates an openai compatible endpoint.

The session so far is stored in a file like /tmp/s.json messages array. Claude reads that file, appends its response/query, sends it to the API and reads the response.

I simply wrapped this process in a python script and added tool calling as well. Tools run on the client side. If you have Claude, just paste this in :-)


Sometimes you can't push your working data to third party service, by law, by contract, or by preference.


I started doing it to hedge myself for inevitable disappearance of cheap inference.


Sure, but now double the team size. Double it again.

Suddenly that $40k is quite reasonable because you’ll never pay another dollar for st least 2-3 years.


Would you?

2-3 years ago people were fantasizing on running local models on a consumer nvidia RTX GPU.


It's not. I've got a single one of those 512GB machines and it's pretty damn impressive for a local model.


Assuming you ran the gamut up from what you could fit on 32 or 64GB previously, how noticeable is the difference between models you can run on that vs. the 512GB you have now?

I've been working my way up from a 3090 system and I've been surprised by how underwhelming even the finetunes are for complex coding tasks, once you've worked with Opus. Does it get better? As in, noticeably and not just "hallucinates a few minutes later than usual"?


I've tried to use a local LLM on an M4 Pro machine and it's quite painful. Not surprised that people into LLMs would pay for tokens instead of trying to force their poor MacBooks to do it.


Local LLM inference is all about memory bandwidth, and an M4 pro only has about the same as a Strix Halo or DGX Spark. That's why the older ultras are popular with the local LLM crowd.


Qwen 3.5 35B-A3B and 27B have changed the game for me. I expect we'll see something comparable to Sonnet 4.6 running locally sometime this year.


This would be an absolute game changer for me. I am dictating this text now on a local model and I think this is the way to go. I want to have everything locally. I'm not opposed to AI in general or LLMs in general, but I think that sending everything over the pond is a no-go. And even if it were European, I still wouldn't want to send everything to some data center and so on. So I think this is a good, it would be a good development and I think I would even buy an Apple device for the first time since the iPod just for that.


Could be, but it likely won't be able to support the massive context window required for performance on par with sonnet 4.6


I’m super happy with it for embedding, image recog, and semantic video segmentation tasks.


What are the other specs and how's your setup look? You need a minimum of 24GB of RAM for it to run 16GB or less models.


This is typically true.

And while it is stupid slow, you can run models of hard drive or swap space. You wouldn’t do it normally, but it can be done to check an answer in one model versus another.


Tokens per second is abysmal no matter how much ram you have


Some models run worse than others but I have gotten reasonable performance on my M4 Pro with 24 GB of RAM


48 GB MacBook Pro. All of the models I've tried have been slow and also offered terrible results.


Try a software called TG Pro lets you override fan settings, Apple likes to let your Mac burn in an inferno before the fans kick in. It gives me more consistent throughput. I have less RAM than you and I can run some smaller models just fine, with reasonable performance. GPT20b was one.


Local LLMs are useful for stuff like tool calling


What models are you using? I’ve found that SOTA Claudes outperform even gpt-5.2 so hard on this that it’s cheaper to just use Sonnet because num output tokens to solve problem is so much lower that TCO is lower. I’m in SF where home power is 54¢/kWh.

Sonnet is so fast too. GPT-5.2 needs reasoning tuned up to get tool calling reliable and Qwen3 Coder Next wasn’t close. I haven’t tried Qwen3.5-A3B. Hearing rave reviews though.

If you’re using successfully some model knowing that alone is very helpful to me.


I'm not really into AI and LLMs. I personally don't like anything they output. But the people I know who are into it and into running their own local setups are buying Studios and Minis for their at home local LLM set ups. Really, everyone I personally know who is doing their build your own with local LLMs are doing this. I don't know anyone anymore buying other computers and NVIDIA graphics cards for it.


The biggest problem with personal ML workflows on Mac right now is the software.


I'm curious to know what software you're referring to.


Yes


I think people buying those don't realize requirements to run something as big as Opus, they think those gigabytes of memory on Mac studio/mini is a lot only to find out that its "meh" on context of LLMs. Plus most buy it as a gateway into Apple ecosystem for their Claws, iMessage for example.

> But I'm def not buying into their rebranding of integrated GPU under the guise of Unified Memory.

But it is Unified Memory? Thanks to Intel iGPU term is tainted for a long time.


We had a workshop 6 months ago and while I've always been sceptical of OpenAI,etc's silly AGI/ASI claims, the investments have shown the way to a lot of new technology and has opened up a genie that won't be put back into the bottle.

Now extrapolating in line with how Sun servers around year 2000 cost a fortune and can be emulated by a 5$ VPS today, Apple is seeing that they can maybe grab the local LLM workloads if they act now with their integrated chip development.

But to grab that, they need developers to rely less on CUDA via Python or have other proper hardware support for those environments, and that won't happen without the hardware being there first and the machines being able to be built with enough memory (refreshing to see Apple support 128gb even if it'll probably bleed you dry).


I feel like the push by devs towards Metal compatibility has been 10x than AMD. I assume that's because the majority of us run MacBooks.


I think that might be partly because on regular PC's you can just go and buy an NVidia card insteaf of fuzzing around with software issues, and for those on laptops they probably hope that something like Zluda will solve it via software shims or MS backed ML api's.

Basically, too many choices to "focus on" makes non a winner except the incumbent.


Who is "us" in this case? Majority of devs that took the stack overflow survey use Windows:

https://survey.stackoverflow.co/2025/technology/#1-computer-...


That's the broad developer community. 90%+ of the engineers at Big Tech and the technorati startups are on MacOS with 5% on Linux and the other 5% on Windows.


> 90%+ of the engineers at Big Tech and the technorati startups

The US 1s? Is that why we have Deepseek and then other non-US open source LLMs catching up rapidly?

World view please. The developer community is not US only.


You’ll see a lot of MacBooks in Beijing’s zhongguangcun where all the tech companies are, but they also have a lot of students there as well, so who knows. You need to go out to the suburbs where Lenovo has offices to stop seeing them. I know Apple is common in Western Europe having lived there for two years (but that was 20 years ago, I lived in China for 9 years after that).

It wouldn’t surprise me if the deepseek people were primarily using Mac’s. Maybe Alibaba might be using PCs? I’m not sure.


I would also expect that the Deepseek devs are using MacBook. If not they may be using Linux - Windows is possible of course but not likely imho. I have no knowledge about that area though so would be interesting to here any primary sources or anecdotes.


Deepseek is in Hangzhou, so I guess they are. GDP/capita in Zhejiang is pretty high, even more so for HZ. If you ever visit, it feels like a pretty nice place (especially if you can get a villa around xihu). I also visited ZJU once, and it was pretty Macbooky, but I don't have as much experience there as Beijing's Zhongguancun.


I live in Germany not the US. I mentioned in another comment but aside from the fact that Deepseek mainly targets Linux I expect that the Deepseek devs are using Mac or Linux.


Source?


Working in three countries, working in big tech and startups, talking to people.


Working there?


I think it's reasonable to say that the people responding to surveys on Stack Overflow aren't the same people who work on pushing the state of the art in local LLM deployment. (which doesn't prove that that crowd is Apple-centric, of course)


Perhaps. Though Windows has been the majority share even when stack overflow was at it's peak, and before.


It's not the whole answer, but SO came from the .NET world and focused on it first so it had a disproportionately MS heavy audience for some time. GitHub had the same issue the other way around. Ruby was one of GitHub's top five languages for its first decade for similar reasons.


Majority of devs are in the global south I presume


Which majority?

I certainly only use Macs when being project assigned, then there are plenty of developers out there whose job has nothing to do with what Apple offers.

Also while Metal is a very cool API, I rather play with Vulkan, CUDA and DirectX, as do the large majority of game developers.


Honestly though, gamedevs really are among the biggest Windows stalwarts due to SDK's and older 3d software.

Only groups of developers more tied to Windows that I can think of are probably embedded people tied due to weird hardware SDK's and Windows Active Directory dependent enterprise people.

Outside of that almost everyone hip seems to want a Mac.


80% of the desktop market has to have their applications developed by someone, at least until software replicators replace them.

Everyone hip alright, or at least those that would dream to earn a salary big enough to afford Apple taxes.

Remember there are world regions where developers barely make 1 000 euros per month.


The only "push" towards Metal compatibility there's been has been complaints on github issues. Not only has none of the work been done, absolutely nobody in their right mind wants to work on Metal compatibility. Replacing proprietary with proprietary is absolutely nobody's weekend project. or paid project.


If coding by AI was truly solved then it would be done with AI, right?


Torch mlp support on my local macbook outperforms CUDA T4 on Colab.


Except CUDA feels really cozy, because like Microsoft, NVidia understands the Developers, Developers, Developers mantra.

People always overlook that CUDA is a polyglot ecosystem, the IDE and graphical debugging experience where one can even single step on GPU code, the libraries ecosystem.

And as of last year, NVidia has started to take Python seriously and now with cuTile based JIT, it is possible to write CUDA kernels in pure Python, not having Python generate C++ code that other tools than ingest.

They are getting ahead of Modular, with Python.


> Are they doubling down on local LLMs then?

Neural Accelerators (aka NAX) accelerates matmults with tile sizes >= 32. From a very high level perspective, LLM inference has two phases: (chunked) prefill and decode. The former is matmults (GEMM) and the latter is matrix vector mults (GEMV). Neural Accelerators make the former (prefill) faster and have no impact on the latter.


There already are a bunch of task-specific models running on their devices, it makes sense to maintain and build capacity in that area.

I assume they have a moderate bet on on-device SLMs in addition to other ML models, but not much planned for LLMs, which at that scale, might be good as generalists but very poor at guaranteeing success for each specific minute tasks you want done.

In short: 8gb to store tens of very small and fast purpose-specific models is much better than a single 8gb LLM trying to do everything.


Probably possible for pure coding models. I see on-device models becoming viable and usable in like 2-3 years on device


> Are they doubling down on local LLMs then?

Apple is in the hardware business.

They want you to buy their hardware.

People using Cloud for compute is essentially competitive to their core business.


"Doubling down on already being the best hardware for local inference"


"Apple Intelligence is even more capable while protecting users’ privacy at every step."

Remains to be seen how capable it actually is. But they're certainly trying to sell the privacy aspect.


> Remains to be seen how capable it actually is.

It's the best. We all turned it off. 100% privacy.


Apple absolutely has a massive opportunity here because they used a shared memory architecture.

So as most people in or adjacent to the AI space know, NVidia gatekeeps their best GPUs with the most memory by making them eye-wateringly expensive. It's a form of market segmentation. So consumer GPUs top out at 16GB (5090 currently) while the best AI GPUs (H200?) is 141GB (I just had to search)? I think the previou sgen was 80GB.

But these GPUs are north of $30k.

Now the Mac Studio tops out currently at 512GB os SHARED memory. That means you can potentially run a much larger model locally without distributing it across machines. Currently that retails at $9500 but that's relatively cheap, in comparison.

But, as it stands now, the best Apple chips have significantly lower memory bandwidth than NVidia GPUs and that really impacts tokens/second.

So I've been waiting to see if Apple will realize this and address it in the next generation of Mac Studios (and, to a lesser extend, Macbook Pros). The H200 seems to be 4.8TB/s. IIRC the 5090 is ~1.8TB/s. The best Apple is (IIRC) 819GB/s on the M3 Ultra.

Apple could really make a dent in NVidia's monopoly here if they address some of these technical limitations.

So I just checked the memory bandwidth of these new chips and it seems like the M5 is 153GB/s, M5 Pro is ~300 and M5 Max is ~600. I was hoping for higher. This isn't a big jump from the M4 generation. I suspect the new Studios will probably barely break 1TB/s. I had been hoping for higher.


>So consumer GPUs top out at 16GB (5090 currently)

5090 has 32GB, and the 4090 and 3090 both have 24GB.


It will be interesting to see the specs on an m5 ultra. Probably have to wait until WWDC at the earliest to see it though


Hard to get 6000+ bit memory bus HBM bandwidth out of a 512 or 1024 bit memory bus tied to DDR... I think it's also just tough to physically tie in 512 gigs close enough to the GPU to run at those speeds. But yeah, I wish there was a very competitive local option, too, short of spending $50k+.


There is a reason those data center GPUs are so expensive: it’s not trivial to “just” 5x the memory bandwidth.


• Having NPU cores since the M1, would seem to verify that running models has been a game plan for a while. LLMs coming along can only have increased that focus.

• Studios with Ultra Mx, now 4-way RDMA over Thunderbolt 5, and enormous RAM and SSD options, suggest a strong focus. I don't know what else that RAM would be intended for. Four Studio Ultras (total of 360 GPU cores with M5 Ultras?) with 2TB of unified RAM is a local model beast.

• They refashioned their GPU cores to better support both graphic and neural processing, despite already having focused NPU cores.

I would say they have been leaning into local models for several years.

I expect we will see more models being optimized for smaller sizes, as demand for them increases. With hardware performance and neural focus trending up, and model requirements/quality trending down, the next few years will be interesting times.

What would make me happy: Ultra x 2 (i.e. 2xUltra, 4xMax, 8xPro, 16xM5) packaging in the Studio. With 8-way RDMA. Mac Kong. Perhaps Apple will start making server cards again.


  Are they doubling down on local LLMs then?
Neural Accelerator was present in iPhone 17 and M5 chip already. This is not new for M5 Pro/Max.

Apple's stated AI strategy is local where it can and cloud where it needs. So "doubling down"? Probably not. But it fits in their strategy.


Given all the supply issues w/ Nvidia, I think Apple's AI strategy should be - local AI everything (not just LLMs), but also make Metal competitive w/ CUDA. Their ace in the hole is the unified memory model.


The hardware capabilities that make local LLMs fast are useful for a lot of different AI workloads. Local LLMs are a hot topic right now so that’s what the marketing team is using as an example to make it relatable.


But memory bandwidth (bottleneck for LLM inference) is only marginally improved, 614 GB/s vs 546 GB/s for M4/M5 Max - where is this 4x improvement coming from?

I think I'll pass on upgrading.


It’s prompt processing so prefill - that’s compute bound not memory.


4x is on Time To First Token it's on the graph.


> Are they doubling down on local LLMs then?

Honestly, I think that's the move for apple. They do not seem to have any interest in creating a frontier lab/model -- why would they give the capex and how far behind they are.

But open source models (Kimi, Deepseek, Qwen) are getting better and better, and apple makes excellent hardware for local LLMs. How appealing would it be to have your own LLM that knows all your secrets and doesnt serve you ads/slop, versus OpenAI and SCam Altman having all your secrets? I would seriously consider it even if the performance was not quite there. And no need for subscription + cli tool.

I think apple is in the best position to have native AI, versus the competition which end up being edge nodes for the big 4 frontier labs.


RE Frontier models/hardware: I'm interested to see what happens with their "private cloud compute" marketing concept now that they're moving from running Siri AI experiences on Apple servers to Google servers instead.


You can deliver confidential compute on GCP.


A useful llm that needs 64gb of ram and mid double digit cores is not useful for 99% of their customers. The LLMs they have on iphone 17's certainly cannot do anything useful other than summerization and stuff. It's a hardware constraint that they have.


> doubling down on local LLMs

Do think it'll be common to see pros purchasing expensive PCs approaching £25k or more if they could run SoTA multi-modal LLMs faster & locally.


It's more that they can't think of anything else that could possibly need that much compute.


Apple's AI strategy really kind of threads the needle cleverly.

"AI" (LLMs) may or may not have a bubble-pop moment, but until it does Apple get to ride it on these press releases and claims. But if the big-pop occurs, then Apple winds up with really fantastic hardware that just happens to be good at AI workloads (as well as general computing).

For example, image classification (e.g. face recognition/photo tagging), ASR+vocoders, image enhancement, OCR, et al, were popular before the current boom, and will likely remain popular after. Even if LLM usage dries up/falls out of vogue, this hardware still offers a significant user benefit.


LLM usage is not very likely to "dry up".

What is more likely to happen though is that it doesn't take multiple $10B of datacenter and capital to build out models--and the performance against LLM benchmarks starts to max out to the point where throwing more capital at it doesn't make enough of a difference to matter.

Once the costs shrink below $1B then Apple could start building their own models with the $139B in cash and marketable securities that they have--while everyone else has burned through $100B trying to be first.

Of course the problem with this strategy right now is that Siri really, really sucks. They do need to come up with some product improvements now so that they don't get completely lapped.


And they will most likely also be the last to benefit from hypothetical efficiency gains because they haven't been building up expertise (by burning billions) yet.


You can hire expertise off your competitors.

Being able to Greenfield something new is a tempting pitch to use to poach employees.

And first to market often doesn't win, or else WebVan would still be doing grocery deliveries. We tend to overstate the first-mover advantages because we more easily remember the cases where that turned into lasting dominance while forgetting all the companies that died to first-mover disadvantages.


those things could likely just run fine on the gpu though


They could run fine on the CPU too. But these are mobile devices, therefore battery usage is another significant metric. Dedicated hardware is more energy efficient than general hardware, and GPU in particular is a power-hog.


Exactly. It's the same thing as video or audio encoding and decoding. Sure the CPU could do it, potentially use the GPU, but having actual hardware encoders and decoders for the most common codecs will save a lot of energy.


Not if GPU RAM is a limiter. Which it is for most models.

Unified memory is a serious architectural improvement.

How many GPUs does it take to match the RAM, and make up for the additional communication overhead, of a RAM-maxed Mac? Whatever the answer, it won’t fit in a MacBook Pro’s physical and energy envelopes. Or that of an all-in-one like the Studio.


I've been so disappointed in Apple's lack of execution on this. There is so much potential for fantastic local models to run and intelligently connect to cloud models.

I just don't get why they're dropping the ball so much on this.


Because it won’t sell enough hardware to matter to them.

They aren’t dropping the ball, they are being smart and prudent.


Downvote all you want. Point blank, they are dropping the ball.


> Are they doubling down on local LLMs then?

I love the push to local llms. But it’s hilarious how apple a few years ago was so reluctant to even mention “AI” in its keynotes and fast forward a couple years they’ve fully embraced it. I mean I like that they embraced it rather than be “different” (stubborn) and stay behind the tech industry. It’s the smart choice. I just think it’s funny.


The topic is MacBook, so my criticism is a little off. However, I really dont believe in this "local LLM" promise from Apple. My phone already gets noticeably warm if I answer 5 WhatsApp messages. And looses 5% of battery during the process. I highly doubt Apple will have a useable local LLM that doesn't drain my battery in minutes, before 2030.


Something is not right if WhatsApp is seriously draining your phone like that. Admittedly I’m not a big WhatsApp user my iPhone hasn’t had any trouble like that with it.


Yeah is OP using an iPhone X?


Bro that’s WhatsApp. Meta is known for their dirty mobile code


have you seen that github repo where they unlock the true power of NE?


Have a link?


Didn't they announce a partnership with Google Gemini?


Honestly, they can keep waiting for another year or two for on-device models at the size they're looking for to be powerful enough.


looks like this will be their angle for the whole agentic AI topic


It is simply marketing nonsense - what they really mean (I think) is they support matrix multiplication (matmul) at the hardware level which given AI is mostly matrix multiplications you'll get much faster inference (and some increase in training too) on this new hardware. I'm looking forward to seeing how fast a local 96gb+ LLM is on the M5 Max with 128gb of RAM.


We've already established in this thread that memory bandwidth isn't that much greater than M4 Max - 12%? However, I wonder if batched inference will benefit greatly from the vastly improved compute. My guess is that parallel usage of the same model will be a couple times faster. So, single "threaded" use not that much better, but say you want to run a lot of batch jobs, it'd be way faster?


Is this a reply to a different comment?


It’s not necessarily doubling down on local. The reality is your LLM should be inferencing every tick … the same way your brain thinks every. Fucking. Nano. Second.

So yes, the LLM should be inferencing on your prompt, but it should also be inferencing on 25,000 other things … in parallel.

Those are the compute needs.

We just need compute everywhere as fast as possible.


What % of users actually care that much about local LLMs? It appears to still be an inferior (though maybe decent) service compared to ChatGPT etc., and requires very top-end hardware. Is privacy _that_ important to people when their Google search history has been a gateway to the soul for years? I wonder if these machines would cost significantly less (or put the cost to other things, e.g. more CPU cores) without this emphasis on LLMs.


Privacy is definitely not a cern for the layman, but it is for lots of people, especially pro users. I also haven’t made a google search in years.

I also haven’t seen any improvements in the frontier models in years, and I’m anxiously awaiting local models to catch up.


> I also haven’t made a google search in years.

That’s makes you so far out at the end of the curve even professionals can’t see you.


> I still think Apple has a huge opportunity in privacy first LLMs

This correlation of Apple and privacy needs to rest. They have consistently proven to be otherwise - despite heavily marketing themselves as "privacy-first"

https://www.theguardian.com/technology/2019/jul/26/apple-con...


I think it's a little telling that the best you can do is a seven year old article.


No other company makes you tell them every application you install on your device. No other company makes you tell them every location you read from your GPS sensor.


Please, source this ridiculous claim


Location: https://www.apple.com/legal/privacy/data/en/location-service...

> To use features such as these, you must enable Location Services on your iPhone and give your permission to each app or website before it can receive location data from Location Services

> By enabling Location Services for your devices, you agree and consent to the transmission, collection, maintenance, processing, and use of your location data and location search queries by Apple and its partners and licensees to provide and improve location-based and road traffic-based products and services.

Android and every other consumer general purpose OS lets you read GPS coordinates from the sensor without telling anyone.

App installs: Any app installed from the App Store obviously tells Apple you installed it. Apple does certificate verification for every side-loaded app, where Apple is the CA. There is no way to install an app on iOS without telling Apple.

Android and every other consumer general purpose OS lets you install apps without telling anyone.



I’m confused because to me that article just said the phone knows a lot about itself, things like what applications are installed, and if someone gets into the phone they can use forensic tools to know those things too. I didn’t see anything about Apple getting that information and nothing about Macs. The location stuff is very well known and is an inherent property of any modern networked device, unfortunately.



So, somehow now they are the beacons of privacy and we should just ignore their history of spying on their users?


I think it's all about relativity. Are they private compared to an open source privacy focused OS like grapheneOS and the fantastic folks running that project? No. Are they more private than a company like meta or google who has much worse incentives for privacy than Apple? Probably.

Do I wish Apple was way more transparent and gave users more control over gatekeeper and other controversial features that erode privacy? Absolutely.


Not for everything. Apple has initially focused on edge AI that runs locally per device. It didn’t work out well the first try, but I would still bet on them trying again once compute catches up. Besides, they still have a better track record than the other tech giants.


Incredibly depressing comments in this thread. He keeps OpenClaw open. He gets to work on what he finds most exciting and helps reach as many people as possible. Inspiring, what dreams are made of really. Top comments are about money and misguided racism.

Personally I'm excited to see what he can do with more resources, OpenClaw clearly has a lot of potential but also a lot of improvements needed for his mum to use it.


He said on Lex Fridman podcast that he has no intention of joining any company; that was a couple days ago.


Ah but that was before he saw the comp packages. But no judgement. The tool is still open source. Seems like a great outcome for everyone.


It sounded to me like he's choosing between Meta and OpenAI:

https://www.youtube.com/watch?v=YFjfBk8HI5o&t=8976


where in the podcast (transcript: https://lexfridman.com/peter-steinberger-transcript/) did he say that?



Lex Friedman is a fraud/charlatan and shouldn’t be listened to.


Well, things change fast in the age of AI


He literally said the exact opposite.


He had to keep the grift going until the very last minute.


Frankly, I hope he maximized the amount of money he made. It's a once in a lifetime opportunity. And nobody knows where AI is headed or if OpenAI even will be in existence in a few years given their valuation and the amount of $ they need to burn to keep up.


"Lower cost to reach customers = lower product and service prices"

This is economically illiterate. Advertising is not a discount mechanism. It is a tax on the consumer. When I buy a product heavily marketed on Instagram or Google, I'm paying for the product plus the auction bid price required to acquire me plus the margin of the ad-tech middleman (which are trillion dollar companies).

You are conflating "information distribution" with "persuasive surveillance." In a world without behavioral advertising, businesses compete on quality and reputation, not on who can exploit the most psychological vulnerabilities to manufacture demand.

As for innovation: The current ad ecosystem has killed organic discovery. You can't build a "micro-business" based on merit anymore. The winner SHOULD be the engineer who solved a hard problem efficiently. But instead the winner is the dropshipper who cracked the arbitrage spread between a cheap, garbage product and a highly manipulative Facebook ad campaign.


The Americans on HN driving tech, science and innovation are enabling Trump to do this. Without you he would be nothing. Where is your integrity? Do you think having no allies makes you more safe? Is this really the world you want?


How are US tech folks more enabling Trump than anybody else who pays tax there?


Some, by working for companies (big tech) that have given little resistance to trump but rather funded his ball room, etc. Sadly, everyone quitting those companies would not really be a reasonable solution either, though there are more possible actions than that


"Elon Musk, the world’s richest man, spent more than $290 million supporting Donald Trump and his MAGA allies on the campaign trail last year." [1]

"Exclusive: How Palantir's Alex Karp went full MAGA" [2]

Look at All In Podcast - tech VCs - they are all in support of this administration.

[1] https://www.the-independent.com/news/world/americas/us-polit...

[2] https://www.axios.com/2025/10/23/trump-alex-karp-palantir-ma...


Context was "The Americans on HN driving tech, ...". I'm not sure that includes Elon.


Paying $200 for Pro at the moment. If a single ad shows up anywhere I'm out. In the free tier? Well.. it's sad but inevitable.


They might not show you ads but they can still recommend you certain products based on a sales commission.


I pay almost the same amount for youtube premium as chatgpt plus. And when I see the creators inserting their own sponsored ads I get frustrated. It stopped youtube's own ads but not the product placements and other ads by the creators.


Not author.

SponsorBlock [0] works pretty well for me (on FF):

"SponsorBlock is an open-source crowdsourced browser extension and open API for skipping sponsor segments in YouTube videos."

[0]: https://sponsor.ajay.app/


Agreed, this would also make me angry.


And hide other products that don't bid enough in the keyword auction.


I'm genuinely curious about the unit economics on the expensive plans for each of these AI plays. It's common to parrot the idea companies are still losing money on them but hard to find actual evidence.


I'm also on Pro and I know that I won't stop it even if ads subtly change the results. I expect all big LLMs to do the switch exactly at the same time, but later than the free versions.


The ad will inevitably show up, the question is will you recognize it?


https://oldcoinbad.com/p/long-degeneracy

In the author’s words, long degeneracy represents “a belief that the world will only get more degenerate, financialized, speculative, lonely, tribal and weird”.

The most concise and holistic explanation of this trend is:

"As real returns compress, risk increases to compensate".


The combination of anime children, terms like "degeneracy", and crypto shilling is frankly extremely repellant.


He's living proof, I suppose. Why care about image of you think the world is burning around you anyway?


I dislike the author's framing of this in terms of right-wing meme culture, but almost all of the analysis is nevertheless correct. An orthodox Marxist could make essentially the same argument using different terminology - except about the inevitability of the trend continuing forever.


> except about the inevitability of the trend continuing forever.

Which is important because it yields completely different behavior from the believer...


Great post, thanks for linking to that. The prevalence of crypto millionaires is definitely a big factor. Especially for people under 35; when you see your peers becoming rich from essentially random behaviors (like buying the right coin), it really undermines the idea that success is linked to hard work. And that impression funnels back into culture.


The big asian bookies don't ban you if you win, they use your sharp bets to improve their price accuracy. Not legal to bet on them if you're from the US though (land of the free??). The biggest betting syndicates use platforms like Punterplay to place bets (often via API) at multiple bookies (Pinnacle, Singbet, SBObet, Betfair, Matchbook, 3ET, VX etc) at the same time.

In a somewhat ironic turn of events the more regulation you have, the worse it is for the customer. Big regulatory burdens require the bookies to extract more from the users, making the offerings more predatory. This is also why the likes of Kalshi can provide a better product to customers at the moment - because they ignore all the regulation.


Casinos have a ton of leverage in some states. Here in Nevada MGM and Caesar's and Wynn, thanks to their expansion, are effectively treated as too big to fail and given huge amount of deference in how they operate by the gaming commission. But there are also incredibly problematic protectionist regulations that I and several other residents who didn't really know each other tried to get rid of through the admin law process, primarily allowing remote signups which would also allow out of state entities to set up shop without literally having a physical casino. Having to physically go to a casino and sign up in person was onerous and clearly pointless, and then impossible during the pandemic, and became a really silly charade. What was supposed to start as public meetings right before the pandemic got dragged out, meetings would get rescheduled at the last minute, and casinos made entirely spurious rationales like "there aren't enough local datacenters" (Google Cloud's Henderson datacenter is surely sufficient for in state traffic?), that they would want taxpayer money for potential loss of revenue (capitalism dude, what are you afraid of?) Meetings would get scheduled in Carson City and that's literally six hours away by car. Agenda items would suddenly be altered. It was a hot mess. We managed to get iGaming in theory legalized but they straight up never even pretended to start working on regulations for it, and now with the 90% loss deduction limit by the IRS on the OBBB books basically have 12.5% house edge on any line to start if it's properly priced. My model can beat 2.5% but 12.5% is insane. If the feds are going to ban pros constructively, well, I can't out lobby a casino. And the pro betting constituency isn't big enough to pander to, frankly. If there's action, it can't actually happen on shore. I realize that "people who can beat the books due to specialist knowledge and can bankroll drawdowns to the extent that returns long term profit" is also publicly not sympathetic and generally people either think we're touts (if it makes me money touting absolutely won't help me, in fact the fewer people I have to interact with the better) or something. Wagering by hand sucks, but no model is perfect, just some are more useful than others, and someone in accounting may be able to figure out that ban or bankrupt is not a sustainable strategy to run books. But with the feds involved to put that imprimatur of authority in writing, I guess I'm never getting my limits lifted. Good luck finding stable liquidity elsewhere.


Worth noting that controlling consumption via extra/less tax on specific products is debated a lot in Denmark. Namely cigarettes have a high added tax (about 2kr/0.3 usd PER cigarette). Increasing the tax and thus the price of cigarettes had a fairly large effect on consumption (0.13-0.82% less cigarettes consumed for every 1% price increase) [1]. Recently it has been debated to remove the VAT from vegetables and fruit to increase consumption of those.

The same logic is used for this book VAT exemption (which is good in my opinion) - I doubt we'll see the same effect though. Young people not reading is a complex problem to solve but books are really expensive to buy, so it's a good place to start.

[1] https://vidensraad.dk/sites/default/files/node/field_report_...


Americans do not realize how much damage Trump has done to the trust in American services. Europeans used to consider America as an ally the same as other European countries, now it is more like an unreliable trade partner. Microsoft tried to reassure the Europeans [1] but not even a month later they were forced to disable the email account of ICC Chief Prosecutor Karim Khan [2] due to sanctions from Trump voiding their reassurance completely. What happens when Trump gets mad at Denmark for not giving him Greenland and forces Microsoft to turn off Danish services?

Every large European company and all of the governments are now considering how to move away from US services. They may not be able to do it quickly but it is a part of the conversation. Customers specifically request that new systems should be independent from US service providers.

In my view the damage has been done and will not go away even after Trump. The Europeans have realized that their only true allies, that they can trust in terms of critical infrastructure, are other European nations. It used to include America, it no longer does.

[1] https://blogs.microsoft.com/on-the-issues/2025/04/30/europea...

[2] https://nltimes.nl/2025/05/20/microsofts-icc-email-block-tri...


"We cannot leave the security of Europe in the hands of voters in Wisconsin every 4 years" - French Minister Delegate for European Affairs.


Which is what Trump seems to want. A Europe not totally dependent on the American taxpayers for their own self defense and their own global power projection agendas. Many countries like Germany are quite far from that. France is more of a leader in that regard.

Although abandoning commercial American software wholesale would likely degrade their own security and GDPs even further than it already is.


Defence and power projection is extremely expensive and tech is incredibly work demanding. Is the average EU citizen ready for 15 days PTO and 50-60 hour work weeks?

I don't mean this as a slight, but I genuinely do not think the average European worker, who has at this point spent most of their career in a pretty cushy worker friendly environment, is going to be up for American style death-race productivity. Or the European style death-race productivity of centuries past for that matter.

The average American worker works 500 more hours a year than their German counterpart. That is 62.5 more work days annually. Trying to close that gap will have people rioting. Never mind the cuts to social programs and bumping retirement age to boost defense spending. Double never mind avoiding Russian energy. Europe would need a wholesale societal rewrite, not just a few more bonds issued.

Its much less resistance to stick out 4 years and hope the US gets it's sanity back.


Both the Biden and Trump administrations have generally opposed European efforts to reduce reliance on foreign arms - The "Buy European" strategy.

The Trump administration seems to be pursuing a dual strategy: publicly demanding Europe be more self-sufficient while simultaneously trying to ensure U.S. economic and strategic interests are not sidelined in the process. Basically "Pay us more for less". Let's see how the NATO summit goes at the end of this month.


I really don't think Trump has run that line with the US weapons manufacturers.

He got the goal of increasing the EU weapons expenses, and entirely monkey-pawed it.


> Which is what Trump seems to want. A Europe not totally dependent on the American taxpayers for their own self defense and their own global power projection agendas.

If so, good because that's what I want. I want to have "peace, commerce, and honest friendship with all nations, entangling alliances with none" as Jefferson said. I consider it a great betrayal of the American people that past governments put us in a position where other nations are depending on us for security. Untangling us from that should be done as gently as possible, but IMO it should be done.



Are these LLM models going the way of search and social media? Optimize them for selling product, optimize them for engagement (to sell products). They become ad machines - and it's gonna be really hard to tell. There is already evidence [1] that you can optimize LLMs for engagement. Hopefully open source can keep this in check... but I'm not optimistic.

[1] https://arxiv.org/abs/2303.06135 - Rewarding Chatbots for Real-World Engagement with Millions of Users


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: