Interesting set of points; the intent to move beyond sterile dashboards and engage in deeper, more meaningful conversations about system health is very welcome, especially in a time when most leaders don't bother to read about Goodhart's law on metrics.
But still, I spot a few points of concern.
- While experienced engineers develop valuable intuition, this can also be a source of significant bias. An engineer's "feeling" might be influenced by their personal comfort with a particular technology, their resistance to change, or their own role in creating the system in question (the "IKEA effect"). Over-relying on intuition can lead to subjective decision-making that isn't backed by evidence.
- What is "simple" for a senior engineer with years of context might be overwhelmingly complex for a new team member. Furthermore, some business domains are inherently complex, and attempting to impose a simplistic model can lead to a system that fails to capture the necessary nuance and ultimately creates more problems.
- Informal discussions can be dominated by the loudest voices, the most senior people in the room, or those with the most social capital. Junior engineers or those with dissenting opinions may not feel comfortable speaking up, leading to a skewed and incomplete picture of the system's health. A more formal "safe space" approach might help here, to increase psychological safety perception of participants, for a better discussion.
- For large, legacy systems that have been in production for years, questions like "Can we explain the system's responsibility in plain English, within 5 minutes?" or "Do simple modifications you expect in hours, take many days?" can be demoralizing rather than constructive. They might highlight known, intractable problems without offering a clear path forward, leading to shame, anxiety and frustration.
1. I agree, numbers are important, and these intuitions and feelings should be backed by numbers. In the post too, I suggest looking at dashboards during such discussions.
2. My definition of simplicity is largely based on Rich Hickey's talk, I would recommend it if you haven't seen it. I think it's possible to be somewhat objective about simplicity. If something is overwhelmingly complex to a junior, ideally a senior engineer is able to appreciate that complexity.
3. Yeah, the loudest voice problem exists, like with any in-person discussion ig. Keeping discussions on slack / notion helps side-step it. Discussion rules with timers, going around the room, anonymous comments, etc can also help.
4. A complex legacy codebase will and should fail the simplicity test, at least wrt a new engineer's experience. And it would serve the team well to accept it, and try to solve for it. Ruminating on any problem without moving towards a solution is frustrating, and can be demoralising, yes. And providing direction and creating momentum in that direction is a leader's job. In this blog post, I only offer questions, not answers :p.
Disaster stemming from the deadly combo of Dunning Kruger effect + network effect + recommendation algorithms floating similar BS up top.
Most who post absolute BS on such "professional" forums as a way to gain a larger "network" easily attract similar ones with relatable mediocrity index. The network effect kicks in and mediocrity gets amplified.
Everyone thus gets forced to either to act insanely ridiculous or just GTFO to retain whatever little sanity they have left in them.
Most corporate jobs are just mediocrity-maxxing absolutely BS jobs. It is only natural that the most popular corporate SM amplifies and promotes BS and mediocrity at scale.
Don't know much about LatAm now but can comment on conditions in Asia.
In fact, internationalization and local language input (especially for Indic content) is much more neater and simpler in Linux as I have seen. And in India/Indonesia/Philippines especially, most SMEs/startups/mid-or-small-banks simply use English itself and get the job done. Except for some PR related stuff, all content and data that they manage is almost always in English only.
1. It took the end of ZIRP era for people to realize the undue complexity of many fancy tools/frameworks. The shitshow would have continued unabated as long as cheap money was in circulation.
2. Most seasoned engineers know for the fact that any abstractions around the basic blocks like compute, storage, memory and network come with their own leaky parts. And that knowledge and wisdom helps them make the suitable trade-offs. Those who don't grok them, shoot themselves in the foot.
Anecdote on this. A small sized startup doing B2B SaaS was initially running all their workloads on cheap VPSs incurring a monthly bill of around $8K. The team of 4 engineers that managed the infrastructure cost about $10K per month. Total cost:$8K. They made a move to 'cloud native' scene to minimize costs. While the infra costs did come down to about $6K per month, the team needed new bunch of experts who added about another $5K to the team cost, making the total monthly cost $21K ($6K + $10K + $5K). That plus a dent to the developer velocity and the release velocity, along with long windows of uncertainty with regards to debugging complex stuff and challenges. The original team quit after incurring extreme fatigue and just the team cost has now gone up to about $18K per month. All in all, net loss plus undue burden.
Engineers must be tuned towards understanding the total cost of ownership over a longer period of time in relation to the real dollar value achieved. Unfortunately, that's not a quality quite commonly seen among tech-savvy engineers.
Being tech-savvy is good. Being value-savvy is way better.
Thanks for sharing the story. Despite the whole TCO being higher, I wonder how the 8K to 6K reduction happened.
On AWS, fargate containers way are more expensive than VMs and non fargate containers are kind of pointless as you have to pay for the VMs where they run anyway. Also auto scaling the containers - without making a mess - is not trivial. Thus, I'm curious. Perhaps it's Lambda? That's a different can of worms.
As said, most of their workloads were on cheap VPSs before. Moved some to 'scale-to-zero' solutions, reduced the bloat in VMs, fixed some buggy IaC, also moved some stuff to the serverless scene. That got a decent ~20% reduction.
Once a capable AI that can make a "Two Brothers" trailer like this one below comes up, humanity has some serious issues to tackle, other than climate change ofc! :D
> there is a great opportunity to help businesses manage their software supply chain
Yes, very much. There are so many layers, components, and their intricate relations that goes totally ignored today at least in most places. Because, doing so is insane amounts of work. Only BigCos can afford to have dedicated teams for 's/w supply chain management', considering the cost-parity-with-returns. However, the solution on this end that works for BigCo doesn't necessarily work for SMEs & startups. That gap isn't small, if am right.
> Another product that is often requested is a visual DAG debugger. When a pipeline break, you want to know why, and staring at your CI logs is definitely not the best experience for that. With a web UI, there's a lot we can do there.
Yes. This definitely helps. But more than a viz DAG element, people look for an early-warning of a failure. Most common build-failure reasons (other than failed tests) -> expired creds used somewhere in the pipeline, provisioning failed/time-out, problem at some other dependent module totally outside org's control (some OSS/dep). People seem to be bothered equally about how to squash'em rather than just where to squash. Locating the part where pipeline broke is just half the part. Actionable insights as to how that pipeline can be healed is the hard part. And considering the diversity of the ecosystem, that's gonna be a wild ride.
BTW, are you folks hiring? "DevOps OS for enterprises" seems very very enthralling, esp for an old toolmaker.
1. If identity providers start offering a dynamic, trusted element within the critical pages (login, password prompt, 2FA/OTP verification etc)
2. if such dynamic element is from a known range/set of customer/trusted-party supplied identity elements.
Ex. During my account creation, say I am prompted to select some "secret identity themes", and I choose { batman, bike, carrots }
At the login/password/OTP prompt, I am shown a 3x3 grid of pics / words / hints, which have at least 3 (or whatever configurable number, in my account preferences) that are somehow connected to my "secret identity theme". This way, I know I can trust this page. The grid also has many unrelated ones acting as decoys elements, so that any malicious spoofing party cannot really figure them out.
I believe you get the general idea.
Do y'all feel this can possibly help, in mitigating this very serious & very harmful threat?
The Barclays Android banking app gets you to choose a few words that you make up, and displays those words on the login screen as a way of authenticating to you that it actually is the Barclays app login screen.
I remember some big service many years ago (maybe yahoo?) had a “memorable image” or something that was associated with your username as some kind of anti phish metric. Of course nowadays that would be trivial to bypass with something like Modliskha or a different reverse proxy passing through the website content.
Yes. That's why a cluster of elements for a "secret identity theme", instead of just one image. (After all, infosec/security is finally just a game of making reward-to-effort ratio too impractical for most threat-actors & thus achieve reasonable 'sense of security', in a world where exploits exist for almost every ring in the stack - including ring 0)
I feel BITB mostly gets used by those who may not really be having access to lob a proxy attack at the intended target as well, which filters a good set, among potential victims.
I think the concern (if you ever see this comment) is that an attacker will for instance put the fake browser ui around an iframe to a proxy to the legitimate website content using a tool like Modlishka. In that case, whatever is presented to the user in the legitimate application (including whichever superheros or whatever are selected that time around) and all of the bogus images will be presented in the proxied version. Transparent proxies like that are very effective ways of doing phishing because you can phish 2fa or even SSO or similar info by just passing on a legitimate login page to the user but through your MITMed page.
Yes, I understand that BITB+MITM is a huge risk. But my point was that most who want to run BITB won't typically have the means to run an MITM along with it. (unless 'MITM within a browser' becomes a reality!)
I was trying to say that the dynamic security element helps in filtering at least the most common kind of attack, which otherwise leaves consumers to bear a very large risk.
Perhaps this is the thing that I don’t understand. Why wouldn’t an attacker have such means? This attack isn’t something that requires control of the network, it’s just a fantastic way of producing a lookalike page.
Cloudflare Mumbai, Bengaluru, Chennai, Hyderabad edge-nodes also unable to serve content.
x.com down.
Few quick-commerce apps are acting up at times.