How have you measured the power usage/cost? That seems like a incredibly high price for electricity, similar to a 600W constant load in my part of the world.
All of my IT equipment in my office is running through a single UPS that measures power consumption.
I do have a bit more than just that server hooked up to it. There's also a Dell i5 running DDWRT as my main gateway/router, the fiber internet modem, a small Synology NAS, a couple of WIFI routers, etc. It all adds up.
That doesn't include my backup server out in the garage with another 8-disk RAID10 array and an LTO tape drive that is often backing up data, 5 more WIFI routers around the property, and 10 or so security cameras. So I'm probably well over $100/mo for all my tech stuff.
Interesting comment, since v4 is the only version that provides the maximal random bits and is recommended for use as a primary key for non-correlated rows in several distributed databases to counter hot-spotting and privacy issues.
Edit: Context links for reference, these recommend UUIDv4:
Yeah, I thought it was a strange comment, too. v7 is great when you explicitly need monotonicity, but encoded timestamps can expose information about your system. v4 is still very valid.
I think "outdated" was a poor choice of words. It is a failure to meet application requirements, which has more to do with design than age. Every standardized UUID is expressly prohibited in some application contexts due to material deficiencies, including v4. That includes newer standards like v7 and v8.
In practice, most orgs with sufficiently large and complex data models use the term "UUID" to mean a pure 128-bit value that makes no reference to the UUID standard. It is not difficult to find yourself with a set of application requirements that cannot be satisfied with a standardized UUID.
The sophistication of our use case scenarios for UUIDs exceeds their original design assumptions. They don't readily support every operation you might want to do on a UUID.
You're talking about the hash-based UUIDv3/v5? I haven't found examples of those being used, but I'm curious.
Using MD5 or 122 bits of a SHA1 hash seems questionable now that both algorithms have known collisions. Using 122 bits of a SHA2/3 seems pretty limited too. Maybe if you've got trusted inputs?
I use these a lot. My favorite use case is templates, especially ones that were not initially planned in the architecture.
Let's say i have some entity like an "organization" that has data that spans several different tables. I want to use that organization as a "parent" in such a way where i can clone them to create new "child" organizations structured the same way they are. I also want to periodically be able to pull changes from the parent organization down into the child organization.
If the primary keys for all tables involved are UUIDs, I can accomplish this very easily by mapping all IDs in the relevant tables `id => uuid5(id, childOrgId)`. This can be done to all join tables, foreign keys, etc. The end result is a perfect "child" clone of the organization with all data relations still in place. This data can be refreshed from the parent organization any time simply by repeating the process.
I remember using them in a massive SQL query that needed to generate a GIS data set from multiple tables with an ungodly amount of JOINs and sub-queries to achieve ID stability. Don't ask :p
For those ~~curious~~ worried, no, this was not a security sensitive context.
Common one is if you want two structs deemed "equivalent" based on a few fields to get the same ID, and you're only concerned about accidental collision. There are valid use cases for that, but I've also seen it misused often.
v7 rough ordering also helps as a PK in certain sharded DBs, while others want random, or nonsharded ones usually just serial int.
It's the same reason we use UTF-8. It's well supported. UUIDs are well supported by most languages and storage systems. You don't have to worry about endianness or serialization. It's not a thing you have to think about. It's already been solved and optimized.
Now generate your random ID. Did you use a CSPRNG, or were your devs lazy and just used a PRNG? Are you doing that every time you're generating one of these IDs in any system that might need to communicate with your API? Or maybe they just generated one random number, and now they're adding 1 every time.
Now transfer it over a wire. Are you sure the way you're serializing it is how the remote system will deserialize it? Maybe you should use a string representation, since character transmission is a solved problem with UTF-8. OK, so who decides what that canonical representation is? How do we make it recognizable as an ID without looking like something that people should do arithmetic with?
None of these are rocket-science problems, they're just standardization issues. You build a library with your generate_id/serialize_id/deserialize_id functions that work with a wrapper type, and tell your devs to use that library. UUID libraries are exactly that, except backed by an RFC.
Of course they're not rocket science. But, the question here is, "Why don't you use random 16 bytes instead of a UUIDv4?" It's not a question about rocket science. The answer is still, "Because UUIDv4 is still a better way to do it." The UUID standard solves the second and third tier problems and knock-on effects you don't think about until you've run a system for awhile, or until you start adding multiple information systems that need to interact with the same data.
But, using UUIDv4 shouldn't be rocket science, either. UUID support should be built in to a language intended for web applications, database applications, or business applications. That's why you're using Go or C# instead of C. And Go is somewhat focused on micro-service architectures. It's going to need to serialize and deserialize objects regularly.
> Now generate your random ID. Did you use a CSPRNG, or were your devs lazy and just used a PRNG?
There's nothing about UUIDs that need to make them cryptographically secure. Many programming language libraries don't (and some explicitly recommend against using them if you need cryptographically strong randomness).
Not for security but to make sure you don't accidentally reuse the same seed. I've done that before when the PRNG seed was the time the application started and it turns out you can run multiple instances at the same time.
128 random bits in some random format aren't a uuid. 0.2ml of water isn't a raindrop. If I say "you can provide me with a uuid" and you give me a base64-encoded string, it's getting rejected by validation. If I say "this text needs to be a Unicode string" and you give me a base64-encoded Unicode string's byte array, it's not going to go well.
Why are you implying that converting from base64 to and from standard UUID representation (hyphen-delimited hexadecimal) is more than a trivial operation? Either client or server can do this at any point.
Does Postgres not truly support UUID because it internally represents it as 128 bits instead of a huge number of encoded bytes in the standard representation? Of course not.
Ah, here we are. If it's just bytes, why store it as a string? Sixteen bytes is just a 128-bit integer, don't waste the space. So now the DB needs to know how to convert your string back to an integer. And back to a string when you ask for it.
"Well why not just keep it as an integer?"
Sure, in which base? With leading zeroes as padding?
But now you also need to handle this in JavaScript, where you have to know to deserialize it to a Bigint or Buffer (or Uint8Array).
UUIDs just mean you don't need to do any of this crap yourself. It's already there and it already works. Everything everywhere speaks the same UUIDs.
You have to generate random bytes with sufficient entropy to avoid collisions and you have to have a consistent way to serialize it to a string. There's already a standard for this, it's called UUID.
It’s really not that complicated a problem. Don’t worry, you’ll certainly be able to solve all the problems yourself as you encounter them. What you end up with will be functionally equivalent to a proper UUID and will only have cost you man-months of pain, but then you will be able to truly understand the benefit of not spending your effort on easy problems that someone solved before you.
It's not a huge problem. Uuid adds convenience over reinventing that wheel everywhere. And some of those wheels would use the wrong random or hash or encoding.
Really? Doesn’t v4 locally make the inserts into the B-Tree pretty messy? I was taught to use v7 because it allows writes to be a lot faster due to memory efficient paging by the kernel (something you lose with v4 because the page of a subsequent write is entirely random).
https://www.thenile.dev/blog/uuidv7#why-uuidv7 has some details:
"
UUID versions that are not time ordered, such as UUIDv4 (described in Section 5.4), have poor database-index locality. This means that new values created in succession are not close to each other in the index; thus, they require inserts to be performed at random locations. The resulting negative performance effects on the common structures used for this (B-tree and its variants) can be dramatic.
".
1. Users - your users table may not benefit by being ordered by created_at ( or uuid7 ) index because whether or not you need to query that data is tied to the users activity rather than when they first on-boarded.
2 Orders - The majority of your queries on recent orders or historical reporting type query which should benefit for a created_at ( or uuidv7 ) index.
Obviously the argument is then you're leaking data in the key, but my personal take is this is over stated. You might not want to tell people how old a User is, but you're pretty much always going to tell them how old an Order is.
There's also a hot spot problem with databases. That's the performance problem with autoincrement integers. If you are always writing to the same page on disk, then every write has to lock the same page.
Uuidv7 is a trade off between a messy b-tree (page splits) and a write page hot spot (latch contention). It's always on the right side of the b-tree, but it's spread out more to avoid hot spots.
That still doesn't mean you should always use v7. It does reversibly encode a timestamp, and it could be used to determine the rate that ids are generated (analogous to the German tank problem). If the uuidv7 is monotonic, then it's worse for this issue.
In distributed databases I've worked with, there's usually something like a B-tree per key range, but there can be thousands of key ranges distributed over all the nodes in the cluster in parallel, each handling modifications in a LSM. The goal there is to distribute the storage and processing over all nodes equally, and that's why predictable/clustered IDs fail to do so well. That's different to the Postgres/MySQL scenario where you have one large B-tree per index.
I believe current official guidance if you want a lot of random data is to use v8, the "user-defined" UUID. The use of v4 is strictly less flexible here.
No, UUIDv8 offers 122 bits for vendor specific or experimental use cases. If you fill those bits randomly, you get the same amount of randomness as a v4. The spec is explicit that it does not replace v4 for random data use case.
> To be clear, UUIDv8 is not a replacement for UUIDv4 (Section 5.4) where all 122 extra bits are filled with random data.
What if I want an ID in the URL? Parse it back and forth? And what if for example, nodejs's UUID api only gives me the string representation of the ID?
To minimize the storage space while having a URL-safe representation, yeah you'd want to serialise/deserialise on the boundary of presenting it to API consumers. I think the same for any ID that has an efficient binary representation as well as needing to represent it in ASCII.
I have a machine with a 6 year uptime that was slowly accumulating single bit error corrections. The EDAC counter mysteriously stopped at 308 last year, and hasn't changed since, so I wonder if a bitflip in the counter circuit made it stop...
The actual RAM chips on a ECC DIMM are exactly the same as a non-ECC DIMM, there's just an extra 1/2/4 chips to extend to 72 bit words.
The main reason ECC RAM is slower is because it's not (by default) overclocked to the point of stability - the JEDEC standard speeds are used.
The other much smaller factors are:
* The tREFi parameter (refresh interval) is usually double the frequency on ECC RAM, so that it handles high-temperature operation.
* Register chip buffers the command/address/control/clock signals, adding a clock of latency the every command (<1ns, much smaller than the typical memory latency you'd measure from the memory controller)
* ECC calculation (AMD states 2 UMC cycles, <1ns).
I think it is still more likely that a project can be improved if everyone has access to the source code vs not having access to any source code. Are there counterexamples to that?
Intel seem to be deliberately hiding the clock frequency of this thing, the xeon-6-plus-product-deck.pdf has no mention of clock frequency or how LLC is shared.
Even if they haven't added any cache control headers, what kind a of lazy Meta engineer designed their crawler with to just pull the same URL multiple times a second?
Is this where all that hardware for AI projects is going? To data centers that just uncritically hits the same URL over and over without checking if the content of a site or page has chanced since the last visit then and calculate a proper retry interval. Search engine crawlers 25 - 30 years ago could do this.
Hit the URL once per day, if it chances daily, try twice a day. If it hasn't chanced in a week, maybe only retry twice per week.
Forgejo does set "cache-control: private, max-age=21600", which is considerably more than one second, but I grant it uses the "private" keyword for no reason here.
reply