This phenomenon (or a closely related one?) is recognized and known as Kotov Sydnrome in the context of chess.
A summary, courtesy of chess dot com:
> The name of this "syndrome" comes from GM Alexander Kotov, author of the classic chess book Think Like a Grandmaster. In the book, Kotov described an incorrect yet very common calculation process that often leads players to select a suboptimal or bad move.
> According to Kotov, in positions where the lines are complex and there are numerous candidate moves and variations to calculate, it's easy to make a hasty move. A player in that situation might spend too much time going over two moves and all of their ramifications without finding a favorable ending position. In that process, the player is likely to go back and forth between the two different lines, always coming to the same unsatisfying conclusion—this wastes precious mental energy and time.
> After spending too much time evaluating the first two options, the player gives up the calculation due to time pressure or fatigue and plays a third move without calculating it. According to the author, that sort of move can cause tremendous blunders and cost the game.
Ah yeah, the general sentiment was definitely "there are laws that everyone breaks every day, so they can always get you on something". Mostly because I remembered the anecdote about the USSR outlawing fax machines that no business could do without, so they could always charge any business with a crime.
Not exactly a true greybeard though, given that the author has "more than 10 years of experience", and judging by the picture on his GitHub page ;) [0]
Good grief. Back in my day, people used to Mark X in sharpies on the top of the card decks so when they (ineluctably) got dropped it'd be easier to re-assemble.
Games couldn't rely on patches back in the day, because they used to be shipped in boxes, and providing patches was tricky. (The ease of providing updates may indeed incentivize releasing half-baked software nowadays).
I am less sure if those careless, youngest programmers endlessly scrolling Tik Tok videos are fervently dedicated to Martin Fowler's misled teachings.
And even less sure if the author summarizes them in a fair manner to begin with: "Many such organizations will build microservices even if their software domain complexity is not high and the software itself is not projected to scale that much. They will do it because their God told them to do it", he snarks.
If you go to Fowler's website [0], however (under "Are Microservices the Future?"), he's not dogmatically advocating for microservices: "Despite these positive experiences, however, we aren't arguing that we are certain that microservices are the future direction for software architectures [...] not enough time has passed for us to make a full judgement [...] There are certainly reasons why one might expect microservices to mature poorly [...] One reasonable argument we've heard is that you shouldn't start with a microservices architecture. Instead begin with a monolith, keep it modular, and split it into microservices once the monolith becomes a problem."
Doesn't really fit the image of a God commanding you to build microservices regardless of the domain complexity and other factors. Perhaps he's changed his tune since, I don't know. A quote would help.
On the other hand, observations such as that "from the usability perspective, the software must be user-friendly" , or "the role of software has become increasingly important, to the point where our world now heavily relies on it.", etc. are undeniably correct.
The list includes the neutrino (added to explain missing momentum in observed nuclear decays), the Higgs (added to explain mass which would otherwise not make sense in electro-weak theory) and arguably antimatter (added to fix incompatibility between special relativity and quantum mechanics).
Reality is more complicated than slogans would suggest. Sometimes adding new types of matter to our understanding of physics was the right choice, sometimes it wasn't.
This is how Neptune was discovered, for example. Astronomers noticed Uranus was moving "wrong" based on their physics of the time and worked out where Neptune should be based on that. When they pointed their telescopes there they found it.
What's good style in production code isn't necessarily good style in test code (and vice versa).
Good style in an SDK or a library is not necessarily the same as good style in a self-contained business app.
Just like suburbs and downtown can't be expected to look similar (whether in Montreal or elsewhere), style, too, is context-dependent, and there's more aspects to this context than just the language of choice.
PGN is a highly redundant format, but it has the inherent advantage of being human readable. The problem is interesting, but I think it falls on the side of "fun" more than "profit". Storage is cheap, and PGN files are still small. An average PGN is still below 1 kilobyte. So one movie in BlueRay quality = about 20 million games. That's a lot. The practical problem is not storage, it's computation. Basically, querying the game database quickly. Compression gets in the way of that.
For example, I've just played a game, now I want to go through the opening and fetch all games from the database that went through the same initial moves/positions (that's not the same thing, as a game may arrive at the same position through a different order of moves; AKA transposition). Let's say, all the way until move 15 or 20, because it will only be at that point that a decent game finally becomes unique by deviating from all the recorded games in the database (AKA a novelty was played).
Or I want to find all games where an endgame of a Queen and a pawn against a lonely Queen occurred. There is actually a query language for that, named (surprise, surprise) Chess Query Language: https://www.chessprogramming.org/Chess_Query_Language
I feel that whatever a superior alternative to PGN might be, its strength would likely be better queryability rather than higher storage efficiency as such.
The problem I’m facing with storing roughly 600 million shorter PGNs is that the database is 100GB or so, and I’m grabbing thousands of them sort of at random. This makes the query IO bound, even though the finding the pages they’re on is virtually instant with the indexes. So a smaller database means less pages read when I do these large reads, ideally. I also have other ideas on ordering the database in a smarter way, but hoping this part helps.
Sure, I see your point. Obviously a wasteful format is also getting in the way of queryability. My point is that the main goal should be to improve for queryability, which inherently requires some optimizing for storage, but that's secondary. As opposed to optimizing exlusively for data size.
Because in the former case it may still be best to accept some compromise (in the form of redundancy/simplicity) to hit the sweet spot.
Especially in the context of many comments that seem to have taken an extremely "code golf"-like approach towards the problem.
> Use StockFish to predict the next move, and only store the diff between the actual move and the prediction.
This ties the algorithm down to one specific version of Stockfish, and configured identically (stuff like the hashtable size etc.), because all such factors will have an impact on Stockfish's evaluations. One factor changes, and you can't decompress the backup.
A summary, courtesy of chess dot com:
> The name of this "syndrome" comes from GM Alexander Kotov, author of the classic chess book Think Like a Grandmaster. In the book, Kotov described an incorrect yet very common calculation process that often leads players to select a suboptimal or bad move.
> According to Kotov, in positions where the lines are complex and there are numerous candidate moves and variations to calculate, it's easy to make a hasty move. A player in that situation might spend too much time going over two moves and all of their ramifications without finding a favorable ending position. In that process, the player is likely to go back and forth between the two different lines, always coming to the same unsatisfying conclusion—this wastes precious mental energy and time.
> After spending too much time evaluating the first two options, the player gives up the calculation due to time pressure or fatigue and plays a third move without calculating it. According to the author, that sort of move can cause tremendous blunders and cost the game.
reply