Its not an AI issue, just a small matter of having lots of rules. Moreover this is not just an issue for non-Western languages: the character â (lower case "a" with a circumflex) can be represented either as a single code-point U+00E2 or as an "a" combined with a "^". Furthermore Unicode implementations are required to evaluate these two versions as being equal in string comparisons, so if you search for the combined version in a document, it should find the single code point instances as well.
> Unicode implementations are required to evaluate these two versions as being equal in string comparisons
What do you mean by "required"? There's different forms of string equality. It's plausible to have string equality that compares the actual codepoint sequence, vs string equality that compares NFC or NFD forms, and there's string equality that compares NFKC or NFKD forms. And heck, there's also comparing strings ignoring diacritics.
Any well-behaving software that's operating on user text should indeed do something other than just comparing the codepoints. In the case of searching a document, it's reasonable to do a diacritic-insensitive search, so if you search for "e" you could find "é" and "ê". But that's not true of all cases.
(OK, so "required" might be overstating it; you are perfectly free to write a program that doesn't conform to the standard. But most people will consider that a bug unless there is a good reason for it)
Unicode defines equivalence relations, yes. But nowhere does is a program that uses Unicode required to use a equivalence relation whenever it wishes to compare two strings. It probably should use one, but there are various reasons why it might want strict equality for certain operations.
In some languages those accented characters would be different letters, sometimes appearing far away from each other in collation order. In other cases they are basically the same letter. Whereas in Hungarian 'dzs' is a letter.
Different languages can define different collation rules even when they use the same graphemes. For example, in Swedish z < ö, but in German ö < z. Same graphemes, different collation.
And we may even have more than one set of collation rules within the same language.
E.g. Norwegian had two common ways of collating æ,ø,å and their alternative forms ae, oe and aa. Phone books used to collate "ae" with æ, "oe" with ø and "aa" with å, while in other contexts "ae", "oe" and "aa" would often be collated based on their constituent parts. It's a lot less common these days for the pairs to be collated with æøå, but still not unheard of.
Of course it truly becomes entertaining to try to sort out when mixing in "foreign" characters. E.g I would be inclined to collate ö together with ø if collating predominantly Norwegian strings, since ö used to be fairly commonly used in Norway too, but these days you might also find it collated with "o".