It's from previous searches actually, we have an 'enrichment' step after the initial rankings come back which helps with semantic deduplication and tries to give us a canonical website domain. We store the Product and tag all matching rankings: https://productrank.ai/product/microsoft and use a 3rd party to map website <-> brand logo.
I love these kinds of websites! I'm not into following crypto news, but the concept (and i'd say design) of this website is great! I look foward to the days when gemini is more popular!
What really happened is that China hacked Google, and it pissed Google off when they finally discovered it, many months later. This wasn’t some small intrusion or data breach. It was a systemic, coordinated, widespread, very deep hack, which among other things, gave China all of Google’s source code. It was very similar in scope and ambition to the Solar Winds hack recently out of Russia. It forced Google to completely rethink their security, which at the time was an Igloo model (hard on the outside, soft on the inside, a Gary Larson reference I think), and they had to migrate to where internal access was also limited, which took years.
I’ll share with you, confidentially wink wink, that most companies way overvalue their source code. It’s actually their engineers who are their biggest asset, because the engineers can reproduce the source code if it’s lost (quite quickly at that), whereas the source code can’t do fuck-all on its own. Moreover, most source code bases are so ugly that you couldn’t give them away. But Google had what they now call “HIP” (High-value Intellectual Property) scattered through their source code, which are the tuning parameters and constants for various algorithms and AI models which are the true Secret Sauce to Google’s Search and Ads dominance.
At both small companies I worked at, the real value was in the business relationships built up (suppliers etc). The code could be rewritten. It wouldn't be bug-compatible, but all the ideas and design were backed up in engineers' minds.
There are even advantages from a clean slate methodology; if there’s sufficient buy in from management.
Reason being: if you have done things before and bore the weight of architectural misgivings, you can clearly reproduce the software.
The biggest barrier to doing this normally is:
A) time pressure.
B) pressure to use existing log available tech
C) pressure to be cheap (which is manifested most often in A&B but is it’s own thing too).
This causes “large rewrites” to be rushed, subpar and lesser quality as the emphasis often isn’t on getting to feature parity.
Regardless, if google lost their code- entirely, we would probably have search by next week, auth in a fortnight, mail by next month and something approximating google cloud by the end of the year.
Because a lot of the really hard lessons have been learned, a reimplementation is just time.
I do. If, and only if, you build up the exact same thing. Once engineers band together for a rewrite that changes project architectural, conceptual, and engineering principles, then you're looking at a runaway deadline.
Most of snippets of Google's code wouldn't be usable without the entire ecosystem around them. Most things are doable because of libraries, internal infrastructure, etc.
leaks of what exactly? It's billions of lines of code spread amongst millions of files which make up thousands of services... It's not like there 10 files with all the magic sauce.
To add to this, it's not all on every engineer's machine. It's not like they can just walk out of the office with a computer with all the source code. It's on a virtual file system that requests files as needed, and you can bet there are systems set up to detect if someone is just downloading the whole thing for no reason.
it's not even that. if you assume that the entire codebase was leaked verbatim, it would still take entrenched knowledge of many google engineers to replicate the same services and infrastructure to support those services.
If any state actor who stole that code could deploy such engineering capabilities, they don't really need the source code tbh.