More

dub · on Dec 23, 2023

Typically the price of not having horizontal scaling is felt more by the engineers than the users, at first:

- Data migrations, schema changes, backfills, backups & restores, etc., take so long that they can either cause or risk outages or just waste a ton of engineer time waiting around for operations to complete. If you have serious service level objectives regarding time to restore from backup, that alone could be a forcing function for horizontal sharding (doing a point-in-time backup of a 40TB database while dropping some unwanted bad DELETE transaction or something like that from the transaction log is going to be very slow and cause a long outage).

- The lack of fault isolation means that any rogue user or process making expensive queries impacts performance and availability for all users, vs being able to limit unavailability to a single shard

- When people don't have horizontal scalability, I've seen them normalize things like not using transactions and not using consistent reads even when both would substantially improve developer and end-user experience, with the explanation being a need to protect the primary/write database. It's kind of like being in an abusive relationship: you internalize the fear of overloading the primary/write server and start to think it's normal not to be able to consistently read back the data you just wrote at scale or not to be able to use transactions that span multiple tables or seconds as appropriate.

dalyons · on Dec 23, 2023

To your first point I find in these discussions the “just buy a bigger server” crowd massively underestimate the operational problems with giant db servers. They are no fun to babysit, and change gets really hard and tedious to not accidentally bring the whole thing down. It becomes a massive drain on the velocity and agility of the business.

baobun · on Dec 25, 2023

Giant sharded DB clusters aren't that much more fun or less precarious... Ever run Cassandra or Clickhouse at scale?

IME vertically scaled replicas/hot-stand-bys are a lot more stable to operate if your requirements allow you to get away with it. OtoH you better already be prepared if/when you hit scaling limits.

dub · on May 26, 2023

As the article says, the vulnerability was fixed in April and the people who discovered it have already been rewarded under Google's Vulnerability Reward Program. Google also proactively detected the problem before being notified by the researchers.

dub · on April 22, 2023

The reasons Vitess didn't have foreign key support historically actually weren't a bold performance tradeoff or anything like that. It was more of a classic, boring, backlog prioritization thing: everyone heavily using Vitess was using gh-ost for schema changes, and gh-ost didn't support foreign keys.

Now that Vitess has native schema change tools it's more reasonable to revisit user-friendly, out-of-the-box foreign key support.

samlambert · on April 22, 2023

Kind of. It was definitely an opinionated choice at some point, justified with performance. Now it is just something we need to add and it's nearly done. It will still have trade offs associated like all distributed systems.

dub · on April 15, 2023

> Obviously not true, in fact none of the companies I worked in that was the case

I once offered a bet to the large security team at a well-known decacorn tech company I worked at: I offered to make a personal, reasonable-sized cash bet with any member of the security team that I would win if I could deploy malicious, unreviewed code to any service or machine of their choice without it being prevented or proactively noticed by them.

The members of the security team all declined my bet. We're talking about a team of probably at least a dozen people, many of who had been working at the company far longer than I and who had been shaping and reviewing the company's security design for years.

They knew perfectly well that I would be able to win the bet. Not because their security was unusually bad, but because it was bad in the common, usual ways. Securing the supply chain is hard, and real security is almost impossibly expensive to add to a system late in the game if you didn't design it in from the beginning.

bembo · on April 15, 2023

Or maybe they simply didn't want to risk personal money on some bet about the state of security at their job. I wouldn't take the bet even if thought the security was good.

dub · on April 15, 2023

If you're not even willing to make a bet for a single signed dollar, that doesn't speak highly to your confidence in your work.

It's fine to not be confident, but when professional security teams at large companies are afraid to express confidence that their systems are non-trivial for a random engineer to hack in their free time, that seems at odds with the claim that it's "obvious" that permission escalation is hard

aflag · on April 16, 2023

Making such a bet is not a really professional thing to do. Regardless of the actual risk it introduces. If I was a manager in that company and two of my employees made such a bet I'd be tempted to fire both or, at the very least, have a very serious conversation. I think that's borderline malpractice.

dub · on April 17, 2023

When I worked at Google back in the day, we used to make dollar bets all the time. You'd tape the signed dollars you won to your monitor.

A willingness to take pride in your work and to not take it too seriously when smart, well-intentioned people make mistakes (e.g. blameless postmortems) is part of the culture difference that led to Google's engineering becoming so exceptional and innovative vs the more corporate, don't-rock-the-boat, fear-driven culture that the traditional businesses had at the time.

aflag · on April 17, 2023

The second paragraph seems at odds with the first. I'd describe a culture where people are making bets on whether or not you can find a bug in someone else's work is the opposite of blameless. I'd consider it quite hostile, to be honest. Specially if it's something that management is actually ok with.

I'm assuming you were at google in late 90s/early 2000s?

nobody9999 · on April 16, 2023

>If you're not even willing to make a bet for a single signed dollar, that doesn't speak highly to your confidence in your work.

I've long thought that one should have the attitude (and act to make it so) that one should be willing to bet their job on the quality of their work, but not necessarily actually do so.

And betting anyone (co-worker or not) that they can't compromise the systems (especially, but not limited to production systems) you're tasked with keeping from compromise is a bad bet -- even if you win.

I'd class that sort of behavior as having serious potential to be a "Career Limiting Move" (CLM).

snapplebobapple · on April 16, 2023

Yah, so they have to pay out on a bet and they become unemployed. That seems really smart. Never gamble in anything that is 100% correlated with your primary source of income.

dub · on March 2, 2023

> A surprising number of systems exhibit this behavior, sadly.

I noticed [0, ∞] delivery semantics in a widely-used, internal/homegrown message delivery system at a big tech company once. The bug was easy to spot in the source code (which I was looking at for unrelated reasons), but the catch-22 is that engineers with the skills to notice these sorts of subtle but significant infra bugs are the same engineers who would've advised against building (or continuing to use) your own message delivery system in the first place when there are perfectly serviceable open source and SaaS options.

dan-robertson · on March 2, 2023

I think whether or not to build your own thing isn’t an obvious choice for a big company. You might have lots of other infrastructure to integrate with and adapting an existing solution might not work as well as making something from scratch that eg integrates with your storage layer or how you manage permissions. The choice may be between having a team dedicated to managing some third party thing (because in the real world these systems have bugs or cases that don’t perform well or need more hardware or rebalancing or whatever) and having a team dedicated to developing and managing an internal solution. The latter case can mean you get to take advantage of existing infrastructure, have less work integrating with this thing, and have in-house experts who can investigate and fix bugs.

I don’t think it’s as simple as always preferring to bring in external things.

dub · on Jan 17, 2023

While I haven't benchmarked JSON vs protobuf, I've observed that JSON.stringify() can be shockingly inefficient when you have something like a multi-megabyte binary object that's been serialized to a base64 and dropped with an object. As in, multiple hundreds of megabytes of memory needed to run JSON.stringify({"content": <4-megabyte Buffer that's been base64-encoded>}) in node

qorrect · on Jan 17, 2023

What kind of sicko embeds a whole binary in JSON ?

dub · on Jan 18, 2023

JSON is the default serialization format that most JS developers use for most things, not because it's good but because it's simple (or at least seems simple until you start running into trouble) and it's readily available.

Large values are by no means the only footguns in JSON. Another unfortunately-common gotcha is attempting to encode an int64 from a database (often an ID field) into a JSON number rather than a JSON string, since a JS number type can lead to silent loss of precision.

A more thoughtful serialization format like proto3 binary encoding would avoid both the memory spike issue and the silent loss of numeric precision issue, with the tradeoff that the raw encoded value is not human readable.

ycombobreaker · on Jan 18, 2023

Isn't HTTP POST content similarly encoded? Likewise with small embedded images in CSS, though I am rusty on that topic. Likewise with binary email attachments in SMTP (though this may be uuencoded, same net effect).

The particular example of a trivial message that is mostly-binary just sounds like a useful test case, more than anything else.

mrlonglong · on Jan 18, 2023

Asshole coders, perhaps.

dub · on Dec 30, 2022

> What kind of brave soul wants to trudge through and maintain log4j in their spare time for zero compensation?

It's not clear to me as an outsider what exactly the Apache foundation is doing for these projects. It feels like Apache is willing to accept code donations from anyone and is willing to attach the foundation's name to code that isn't widely used, actively maintained, or may just be abandonware.

I have soooo much more confidence in CNCF projects. The conditions for graduating as a CNCF project include criteria like that your project must be in use by multiple real companies, have maintainers who are (paid) employees of multiple different companies, and get a professional security audit.

rectang · on Dec 30, 2022

> It feels like Apache is willing to accept code donations from anyone and is willing to attach the foundation's name to code that isn't widely used, actively maintained, or may just be abandonware.

That’s incorrect. Projects need to report quarterly and need a Project Management Committee of at least three people, or they are retired. Retired projects may not make releases.

(Source: past ASF board member, who used to review those reports each month.)

There are a fair number of retired projects, and others that may become retired within the near-to-medium term. The ASF has been around for a while, and every software project has a life cycle. Those are still associated with the ASF brand because Google, whatcha gonna do? An explicit retirement policy overseen by a board is still superior to how the vast majority of open source projects approach end-of-life.

xorcist · on Dec 30, 2022

In theory. Open Office shows that the process of retiring semi-abandoned projects leaves a lot to be desired.

The project has few, if any, volunteers, and there are security problems known to be actively exploited, yet the ASF is not willing to work to find a viable solution.

dub · on Dec 31, 2022

Open Office losing popularity and having a shortage of developers makes some sense to me given all the progress in web-based document editors.

Something I have a harder time understanding is how it came to be that Apache Thrift and Facebook Thrift both exist as competing implementations of the same software originated by the same company.

xorcist · on Dec 31, 2022

The implied point with Open Office was not the users habits shifting, but that there was a fork in name only. The project is still under active development with a diverse set of developers but under the name Libre Office.

Only a skeleton crew of paid developers stayed with Open Office, enough to cut releases regularly but not even to fix the security issues actively exploited. All distributions moved with the developers, but there is a discoverability problem which has led to mostly Windows continuing to install the unmaintained version.

The ASF could have fixed this quickly, either by helping out with the trademark issues, moving with the developers, or at least moving the unmaintained version to the attic and steering new users towards the actively developed version.

But they collectively decided to sit on their hands as users continued to install unmaintained software rather than take the slightest risk of offending one of their members. From an outside perspective, all of this was completely unnecessary.

The Thrift situation is another example where some active stewardship could have made a difference.

fh973 · on Dec 30, 2022

Apache is what CNCF will become when marketing budgets move on.

cedws · on Dec 30, 2022

>It feels like Apache is willing to accept code donations from anyone and is willing to attach the foundation's name to code that isn't widely used, actively maintained, or may just be abandonware

That's why I'm allergic to Apache software. A lot of it is overengineered, insecure, legacy abandonware.

dub · on Dec 11, 2022

I'd be more excited to use GPT to draft a summary of release notes by scanning all the new PRs in a release, summarizing what they are, and dividing them up into categories (bug fix, feature, breaking changes, etc.)

eddiequinn · on Dec 11, 2022

I've been using conventional commits for that, if there was equivalent of this that conformed to the CC standard then I would give it a try

dub · on Nov 23, 2022

> Anything that isn't in the "happy path" of the AdsUI probably gets handled by some engineer making some API calls to a prod API

Prior to going private, Twitter would have had recurring Sarbanes-Oxley audits. Auditors understand the need for occasional emergency break-glass methods of making manual database queries or API calls, but they are less tolerant about that being a normal way of operating.

Plus, if you use emergency access often you'll eventually waste more time explaining each individual access to auditors at the end of the quarter than it would have taken to just implement a UI for the feature in a code-reviewed and audited internal admin console or user-facing UI.

dub · on Nov 11, 2022

It's been half a year since terra/luna crashed but the name lives on at Nationals Park: https://www.mlb.com/nationals/tickets/premium/nightly/terra-...