Huh, spaces. There's way too much software, especially on Windows, that breaks w...

DnDGrognard · on Nov 11, 2021

I had a really odd one last year where a Grave I ( well known brand name) got converted by office/excell into a Double Grave I.

The double grave I is used by some obscure orthodox religionious texts

DarkWiiPlayer · on Nov 11, 2021

A friend had the username "Rubén" and jfc it broke everything other than windows itself xD

dhosek · on Nov 11, 2021

The problem isn't the Cyrillic or the é but the fact that Windows lets you put those characters in file names in non-Unicode encodings which will create sequences of bytes which are invalid UTF-8. It's 2021, FFS, stop using legacy encodings.

grishka · on Nov 11, 2021

All win32 functions that accept or return strings come in two varieties, with A and W suffixes, MessageBoxA/MessageBoxW. The A works with the system default 8-bit encoding (cp1251 in case of Cyrillic), the W works with unicode in wide chars. There shouldn't be much of a problem with string handling if you stick exclusively with W functions.

ziml77 · on Nov 11, 2021

Using the W functions has been the advice from Microsoft's documentation for ages. But people still use the A functions because they're easier, especially when writing cross-platform software since Windows is the only major OS that made the unfortunate choice of having the base character type 16 bits wide.

Fortunately the future of the Windows API does look better since Microsoft has now added proper UTF-8 support since Win 10 1904. All you have to do is request it in the application manifest and the A functions will accept and return UTF-8.

grishka · on Nov 11, 2021

> since Windows is the only major OS that made the unfortunate choice of having the base character type 16 bits wide

Apple OSes use something they call "unichar" inside NSStrings. I'm not 100% sure what it is, but it feels like it's the same 16-bit wide character.

ziml77 · on Nov 11, 2021

It's possible! It seemed like a sensible choice back in the early 90s when the answer to making a system for global use was UCS-2. I know Java was another one that went with that decision.

mjevans · on Nov 11, 2021

I would rather they added a U suffixed version and better still backported that all the way to Win 7. Now in 3-7 years people can write programs that use the A functions, but have to check the version of Windows and refuse to run if it isn't new enough.

colejohnson66 · on Nov 11, 2021

There’s been some talk of repurposing the A variants to work on UTF-8

account42 · on Nov 12, 2021

> All you have to do is request it in the application manifest and the A functions will accept and return UTF-8.

They really should have gone with WTF-8 [0] since the W functions generally accept WTF-16 and not just the valid UTF-16 subset.

[0] https://simonsapin.github.io/wtf-8/