Hacker Newsnew | past | comments | ask | show | jobs | submit | metalspot's commentslogin

corporations and governments fund most linux development. for hardware companies software cost is a tax that decreases their revenue and profit, so Nvidia and AMD have strong incentives to support open source models, which they are, very actively.

The key thing here is that effective intelligence = model capability / cost. If you drive down the cost of inference you can have higher effective capability even with a technically less capable model. There is nothing in Anthropic/OpenAIs general reasoning capabilities that can't be easily done much better with a purpose built harness for a domain specific task.

> Fable was the strongest model on the market

based on Anthropic's own self promotion. no reason to think that Chinese models are not just as good or better. the key thing here is training on machine code and dis-assembled binaries and the Chinese have a complete data set of pirated software, with no limitations on how they use it. I seriously doubt they are actually behind.

> only if you're not a US citizen, but in practice, even if you are

the issue here is that Anthropic needs a legal opinion that their mechanisms for detecting foreign users in the US are compliant, which is technically hard to do, and a complex intersection of technical details and national security law, so getting a legal opinion can't happen overnight. it will be back.


i have been using deepseek-v4-flash since it came out. i use a highly structured harness and spec/test driven workflow running through opencode, and so far there has been nothing it can't do.

i have run through a bunch of tests: re-writing vvenc with assembly kernels, creating the first generation agent harness integration with opencode, porting TS npm modules to C++, porting an entire TS server app to C++, creating a new pure io_uring http server with zero-copy (325K RPS single core), creating a second generation agent from the ground up in C++, setting up a dev environment for custom kernel development on tenstorrent accelerators using tt-metal and ttsim.

i consistently get 98.5% input cache hit ratio. i do see noticeable degradation in performance in the 400-500K context range, so i always try to wrap up sessions by 500K max.

a non-intuitive thing is that the model is very good at low-level systems engineering. i suspect this is because they are internally using it to port their stack to huawei hardware. it can churn out exceptionally complex low level C++ stuff that blows your mind, and then completely choke and run in circles on other seemingly simple tasks.

i only use flash and not pro because i want my tooling to be portable to open weights models that are practical to run. i use deepseek platform and not the open weights models for development, because it is subsidized, and based on observation, i think it is highly likely that they are running some proprietary features on the platform which are not in the open weights model.

it will be very interesting to see what their next point release looks like. the compounding effect of optimizing inference cost and then feeding back inference into training should lead to rapid and accelerating improvement, but only time will tell.


Thanks for the details. What's a second generation agent?

You mentioned the workflow is heavy on specs and tests. The smaller models seem to be really good at following instructions now. (Well, some of them!)

So that's probably part of why you're seeing good results. It has a very clear target.

Whereas with more open ended instructions they seem to struggle more. I think common sense is the main thing you get with model size.

When I'm working with the big models I feel like I don't have to spell things out so much. The gap is closing, but I'm assuming there is some fundamental limit there based on the size.

Of course the ideal would be Mythos, running for free, in my house, at 1,000 tok/s ;) Someday...


> What's a second generation agent?

i meant that i initially developed an agent harness as a set of skills integrated with opencode and now i am in the process of using that to write a new agent from scratch to replace opencode.

> probably part of why you're seeing good results

yes. i think tests and setting up feedback loops for diagnosing errors (logs, debugging, etc) are the most important things. in my experience deepseek-v4-flash tends to ignore instructions to use these tools and default to churning through code and guessing the cause of errors, which is often wrong, so it requires occasionally stepping in when it has been grinding fruitlessly for a while and reminding it, probably due to context length and sparse attention forgetting instructions that are put in context at the beginning of a session.


Thank you a lot for such an insightful comment. The low level stuff part, including porting entire codebases using DV4Flash came as a genuine surprise to me. I did not expected it to be this good.

When you say "i use a highly structured harness" ... can you please tell me what is it exactly?



Thanks..

> glorified autocomplete machine

It is a next token prediction function and it is important to understand the technology accurately based on what it actually is.

What is unique about a next token prediction function though is that every computer program is just a string of instructions. At the theoretical limit a next token prediction function can generate the entire instruction stream (boot loader, OS, application) so a next token prediction function can theoretically generate any computer program, which means that it is a universal predictor for anything that a computer can simulate. Still not AGI/ASI in the woo-woo non-technical interpretations of those terms, but incredibly powerful.


What you’re saying is correct if the model is trained with all the knowledge humanity had, has and ever produce. But at the moment the next token prediction is quite limited to the training data.

Things could change if the model supports re-inforced leaning. That way the LLM would change the weights in real time based on a feedback loop, but again that could vastly improve the quality of the token prediction or completely degrade it as well


The distinction I would make here is that computer code is logical transformations on arbitrary data, not the actual data itself. An LLM can learn the entire space of logical transformation patterns from existing code, and can hallucinate new logical transformations, using a computer as a validator for the logic, so an LLM can create new logic as well as repeat existing patterns, and that logic can be applied to novel input data that the LLM has never seen before.

That’s not how LLMs work at the moment as far as I understand. LLM would not hallucinate any new logical transformation, rather just predict a transformation from its training data.

I understand that there can be many different combinations for all the logical transformations in the training data. But still the number of combinations are finite and I would assume that large number of those combinations would not result in any meaningful outcome.

Best outcome is that it just predicts a new pattern we haven’t discovered (LLM randomly connected the correct dots) one example is protein folding.


> If you deploy 10x faster, than me as business owner need less of you for the same amount of work

An important consideration here is that velocity is not zero sum. If you are delivering in weeks what used to take months you are creating an entirely new realm of what is possible to do with software within a corporation.

In the real world, I have never worked for a company that doesn't have a huge backlog (either tracked or in engineers heads) of work that would never be done because it wasn't economic under the old model. This tends to apply to the internal work of engineers (developer tooling, infrastructure, tech debt, etc) more than anything else. 10X faster doesn't necessarily mean shipping 10X product code. You can use that productivity boost to accelerate prototyping, ship betas faster, move the iteration loop faster, all while shipping higher quality code with less tech debt and having the time to continuously improve the engineering side of things that the business never sees.


Most companies aren't software houses.

If you fulfill your delivery contract in half the time, great for me, you now need track down another customer.

Or put in another way, an agency now only needs a third of previous team sizes to deliver the same amount of work.

The other two thirds might be lucky to have another project assigned, or get to seat on the bench, and depending on the world region (offshoring shops) get their salary halved, before being fired if seating too long on the bench.


> people are driven by different things

This is important to understand. I have been coding since I was 11 when I got my first C64, and it is a genuine passion for me, but I also love working with LLM tooling.

One of the biggest things for me is that after decades of sitting in front of a computer I have chronic back and wrist pain that makes it impossible to do the long deep focus sessions that were normal when I was younger. Using AI tooling to handle all of the procedural tasks (running tests, debugging, managing git, etc) dramatically reduces the physical strain of programming, and allows for a much healthier workflow, with regular short breaks.


That's awesome, I'm happy for you finding such great value in it!

Not sure if it's important or not, but for the sake of OP's discussion I note that your value is not necessarily tied to "speed of execution".


Calling it AI is where the problem lies. An LLM is not AI. It is a next token prediction function. That is a very powerful function but just one function out of millions in the overall stack. As an engineer you still have to have the right framework to call that function with the right inputs at the right places and validate the results. But if you focus on the technical details and not the marketing hype you can get amazing results in the areas where it works.

AI accelerates the entire white collar economy and makes it more efficient. It has and will continue to result in large job gains for blue collar and service workers. The people getting rich off AI have to spend their money somewhere.

> It has and will continue to result in large job gains for blue collar and service workers.

Source? Specifics? This doesn't even sound plausible at face value. Even if it is somehow true, higher paying white collar jobs being replaced with service jobs that pay far less and have way worse conditions is not a positive or even a neutral outcome.

> The people getting rich off AI have to spend their money somewhere.

The amount of wealth hoarding already going on says otherwise. Buying yachts and islands does not magically offset millions of jobs being lost.


> The people getting rich off AI have to spend their money somewhere.

That's demonstrably false.

If it were remotely true, trickle down economics would have been a gold rush for the entire economy.


Not sure how generating exponentially more boilerplate is going to make anything more efficient. I guess we'll find out.

I am getting 98.6% cache hit ratio on deepseek-v4-flash with opencode


That’s impressive!

On the sheer performance it’s comparable to Opus ?


Here are my stats (from DeepSeek directly, with a script I wrote). The prices are what equivalent Sonnet usage would have cost, the actual amount I paid was $10. On performance, DeepSeek V4 Pro is comparable to Sonnet for me.

     ./cost.py amount-2026-5.csv 0.3 3.75 15
    input_cache_hit_tokens: 472,971,520 tokens -> $141.8915
    input_cache_miss_tokens: 13,299,013 tokens -> $49.8713
    output_tokens: 3,334,962 tokens -> $50.0244
    cache hit rate: 97.27% (472,971,520/486,270,533)
    cache miss rate: 2.73% (13,299,013/486,270,533)
    total: $241.7872
All of this usage was with an OpenCode subagent exclusively.


out of curiosity, how do you measure cache hit rate in opencode ?


opencode stats


So the calculation is:

Total input token = input + cache read + cache write Cache hit rate = cache read / total input token.

That is 71% in my very limited use of opencode.


The first


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: