Similarly there are many sites that allow you to log in using `your password` or...

foepys · on Oct 9, 2021

This doesn't always work by the way. When you venture outside of ASCII, it's quite often uppercase(lowercase(x)) ≠ uppercase(x) and/or the other way around.

The German letter ß gets uppercased to SS instead of ẞ by most libraries in a neutral/generic culture. ẞ on the other hand gets lowercased to ß.

This happens because there wasn't an official ẞ in German until recently but the uppercasing/lowercasing standard was already written for ß.

naniwaduni · on Oct 9, 2021

This doesn't particularly matter since there's a perfectly good fallback for people who do happen to have "exotic" passwords: just type the password in correctly. As long as your mapping is consistent, it's totally fine if the bit sequence you're hashing is linguistically nonsense, because you're never going to display that to the user.

What you don't normally want to do is normalize passwords before hashing and then only store the hash of the normalized string, because that's fragile to changes in your normalization algorithm, e.g. updating your Unicode data tables.

johannes1234321 · on Oct 9, 2021

Thjs becomes even nkre fun with the Turkish i, which looks innocent, but doesn't necessarily become I. http://www.i18nguy.com/unicode/turkish-i18n.html

skitter · on Oct 9, 2021

Another case where it doesn't work is with keyboard layouts that use shift lock instead of caps lock.

McMiniBurger · on Oct 9, 2021

but don't password fields only recognize ascii?

it seems i just can't type korean to password fields

jeroenhd · on Oct 9, 2021

Most competent websites I know accept general UTF8 characters like emoji perfectly fine. There are a lot of crappier websites that don't even have proper unicode support for usernames or profile descriptions out there, though, so your mileage may vary.

As far as I know, there's nothing preventing a password field from containing any valid unicode string. The problem may be IME support or servers stuck in ASCII, but the textbox itself will just work.

tsimionescu · on Oct 9, 2021

Even surprisingly big names are surprisingly bad at this. Don't know recently, but Hotmail/Outlook used to have a rule of only using letters, numbers, and a handful of symbols, also limiting you to at most 16 characters or something. You couldn't even type a space!

jrootabega · on Oct 9, 2021

They're not necessarily "bad" at it; there's a good chance they just want to make sure that the least competent of their users doesn't make a password that they have trouble with later. They don't care that security-conscious people get frustrated with it.

So I guess that could also be "bad," but not incompetent "bad" or Michael Jackson "bad."

thefreeman · on Oct 9, 2021

This is much more excusable for email providers to prevent phishing. There are a ton of unicode points that indistinguishable from ascii letters. There are other security issues that can arise as well. Here is an example from spotify https://engineering.atspotify.com/2013/06/18/creative-userna...

tsimionescu · on Oct 12, 2021

I should have specified - this was (is?) for passwords, not usernames. I'm much more sympathetic to limited character sets in usernames, but I don't see much valid reason for doing so with passwords

jeroenhd · on Oct 9, 2021

Honestly, that stuff only proves that big name websites aren't necessarily competent. PayPal used to let you register an account with a password longer than the maximum password length used in the authentication code, for example, essentially allowing you to set a password you could never use with your account again. Being worth billions doesn't mean you've got all the basics down, it just means you've tricked many people into giving you their business.

Even good websites that will accept any valid password string will sometimes cut off the last part of a long password because their hashing algorithm throws that data away. Bcrypt, for example, supports a maximum input length between 50 and 72 bytes, depending on the library you use to hash your passwords. That's bytes, not characters!

More primitive systems used to have problems with non-alfanumerical passwords and once those algorithms have been unleashed upon the unsuspecting public, you need to support them in your login flow for years to come.

spookthesunset · on Oct 9, 2021

For what it’s worth the “big” company I work for stores usernames in MySQL. 15 years ago when the username column was created it was set for ASCII (or whatever legacy charset it was). Changing it to utf8 would be a royal pain in the ass, requiring all kinds of testing and crazy updates across the entire company.

So while we’d love to make it utf8, it is just too much work to justify doing over other things.

tsimionescu · on Oct 12, 2021

I should have noted - i was talking about their restrictions for passwords, not usernames. Since those are hashed before storage, i think there are far fewer excuses for such limitations.

darkhorn · on Oct 9, 2021

This is very English alphabet centric view.

> only costs a single bit

What if the password includes İ. The swapcase would be i. And the again its swapcase would be İ. And swapcase of I is ı. And swapcase of ı is I. Right? Well, it should depend on what language you use. Or should it?

Also I think this was in Github; they ask uppercase and I enter Ğ and Github doesn't recognize it as uppercase letter.

jikbd · on Oct 9, 2021

I remember this being the case on Facebook?

egeozcan · on Oct 9, 2021

It was possible to login with the reverse of your password (as in password.split().reverse().join('')).

piaste · on Oct 9, 2021

What was the reasoning? Unlike case or typos, you wouldn't accidentally type a password backwards.

p49k · on Oct 9, 2021

For a long time, in some browsers/OSes there was a bug (or perhaps an archaic feature that was accidentally triggered) where the cursor in an input could get stuck and cause all new characters to be inserted to the left; I'm assuming it's related to that.

Jenk · on Oct 9, 2021

Notably this was a bug on the input for _setting_ your password, so if you think you've set Password123, you might have actually set 321drowassP, so even after fixing the bug it would still bite many users.

Aachen · on Oct 9, 2021

This is the first I've heard of it, and as a Linux user I feel like it's the kind of thing I'd either know about or experienced first-hand. What kind of system would do that? And "for a long time", like, you can't ever login anywhere, it's kind of obvious and breaking functionality badly, how can this exist for more than a single release if at all?

p49k · on Oct 9, 2021

To be clear, it’s intermittent. Perhaps one in 500 times an input is focused, it exhibits this behavior.

I have experienced this so many times over my life with so many different hardware/software configurations, and I have to assume others have as well. It hasn’t happened in years but could explain why the “fix” described in the parent post was implemented.

input_sh · on Oct 9, 2021

This is in no way an educated guess, but it could be something about dealing with right-to-left language support?

SahAssar · on Oct 9, 2021

The actual character bytes do not go end-to-start in RTL text, so I have a hard time seeing it'd be that. I have no better guess though.

delaaxe · on Oct 9, 2021

But maybe humans were typing in reverse?

williamdclt · on Oct 9, 2021

I could imagine having the left arrow key accidentally pressed (or stuck), but that's pretty niche