But you pay the cost of retrying the failing tests and lack of clear signal. And if the application code is flaky, users get to experience the breakage too.
If an application is flaky then I want to know: How frequently does it fail? How does this depend on combinations of configuration parameters? How does this compare between the stable, master, and next branches? etc.
The best way that I know for doing this is to write tests that are flaky because they expose the underlying flakiness in the application.
If an application is flaky and its test suite always runs 100% then I'd be pretty suspicious about that test suite being adequate.
This is the only relevant factor. Forget the rest. Users don't experience your flaky tests just like they don't experience your messy Jira boards or your bad office coffee.
How do you know which is failing without exhaustive analysis?
See, once you know why the test fails and it's not the tested application, which is exceedingly rare in practice, you can just disable it or fix it. But only if you're actually sure, not before.