Sorry for the late reply, I don't use HN much. No idea if you'll actually notice...

Sorry for the late reply, I don't use HN much. No idea if you'll actually notice this, does HN even have a "Reply Notification" feature?

Regarding what you wrote, I agree pretty much. As I said, I am not an expert in this field, so I am not aware of the most cutting edge stuff put there. But even the few languages I know and have seen are so different from each other (some more than others) that it seems unlikely that a single "theory of everything" would suffice for text, especially in the way we process text presently.

Perhaps there is some way to abstract out the differences, but I don't really see how. After all, characters are where the differences only begin. Start thinking about words or sentences and no single route seems viable for the way we do string processing today.

You probably expected a more substantial comment, but I don't really know enough of this field to make one.

Regarding क्स and डे, the difference between them is that the former is a combination of two consonants (pronounced "ks") while the latter is formed by a consonant and a vowel ("de"). However, looking at the visual representation is wrong, since डा (consonant+vowel) would also look like two characters. If you copy these into a text field and try to erase them through backspace or delete, you should see how it all works (assuming the text field functions correctly).

But again, these confusions only exist because Devnagari allows simple characters to form compound characters. That is obviously completely different than how Roman script works, which is probably completely different than various pictographic languages. So, how to reconcile the differences (except by hiring native speakers of every language out there)? I wish I knew, but currently I don't.