*File replacement can be done now through non-portable renaming gyrations.* I wa...

wahern · on Sept 8, 2020

It's atomic but it might not be persistent in the event of a crash. If the metadata wasn't committed then upon remounting the old file might reappear, and its likelihood of reappearing is independent of any other operations on that filesystem. Though, traditionally rename was not only atomic but also preferentially ordered ahead of other metadata operations so subsequent renames of other files wouldn't be visible if the older rename appeared.

EDIT: Actually, I think the issue I had in mind was that writes to the new file might not be committed before the rename, so if you do open + write + rename + close, on a crash the rename might persist even though the write didn't. You technical should use fdatasync if you want the write + rename to be ordered. See, e.g., https://lwn.net/Articles/322823/ But this is such unintuitive behavior that I think that even for ext4 the more reliable behavior is still the default.

jacobsenscott · on Sept 8, 2020

Isn't this true of any atomic operation in the event of a crash? You'll either see the old data or the new data when the system comes back up.

wahern · on Sept 8, 2020

Yes. But traditionally Unix filesystems provided stronger consistency guarantees than required by POSIX, and applications have come to rely on them. Actually, in the event of a crash I think POSIX specifies implementation-defined behavior even for fsync.

formerly_proven · on Sept 8, 2020

Making fsync a no-op and not having any durability at all is perfectly POSIX compliant. POSIX only concerns itself with the visibility of I/O in the live system (e.g. a process either sees a write fully realized, or not at all - this didn't use to be true for larger writes until fairly recently, btw.). POSIX specifies nothing about what happens when a system is restarted or looses power.

ori_b · on Sept 9, 2020

The alternative would not allow for a posix-compliant implementation ramfs.

the8472 · on Sept 8, 2020

The dance can get even more complicated[0] if you also want to ensure that the rename itself is persisted. But you only really have to encode this in an atomic-file-write-with-callback library once.

[0] https://danluu.com/file-consistency/

simias · on Sept 8, 2020

I agree with you, for me the main limitation of this system is that you have to be sure that the temporary file exists in the same fs as the target, which in general involves creating the temporary file in the same directory. It works well, but I always feel icky creating temporary files in the middle of the filesystem (which may end up lingering if my program crashes before I could issue the rename syscall).

But beyond that I'm not aware of any portability issue, except maybe with weird filesystems like NFS in some non-standard configuration that doesn't implement strict locking.

inopinatus · on Sept 8, 2020

It can be re-ordered w.r.t. a hard-link and any derived written block data (e.g. indexes) and I've seen this first-hand with early maildir implementations across a crash, and that was even without throwing the spanner of NFS into the gears.