The problem I leap to is that this article seems to suggest that our thinking process is Bayesian. That is unlikely.
We update our beliefs based on new information, but the whole point of Bayesinaism via Bayes Theorem is updating it by a very specific amount based on evidence strength. Nobody is approximately Baysian in their thought process, and I doubt most people can even be trained to be. Statistics is a pen & paper exercise for the most part.
In my experience, which is not negligible, is the hardest part of statistics is talking people down from beliefs they have settled on because of something that looks like statistical evidence, but in fact is not.
Bayesian approaches are gaining momentum in cognitive sciences. See for instance "The Bayesian brain" article by Knoll and Pouget.
There is also Cox's theorem, that essentially states that probability theory is the only sound system that extends logic in the presence of uncertainty,
to continuous values (between 0 and 1) instead of the two values 0 and 1.
Fair point, the beginning section does seem to suggest that. I have a note in the middle explaining this is not the case.
I make the connection to demonstrate that Bayes is not some arcane statistical artifact, but qualitatively in line with intuition, albeit miscalibrated. This opens up avenues to then move towards ideal bayesian-ness.
Ah, hey, didn't see you were also the submitter. I enjoyed your writing; it is interesting to run across systems thinkers.
And just to fill up the word count; if you haven't read up on the major statistical paradoxes it is possible you'll enjoy them. https://en.wikipedia.org/wiki/Category:Statistical_paradoxes for your attention. If you are playing with stats for the framework then the other half of the fun is delving into the paradoxes; knowing them by heart is a great trick for interpreting evidence. Simpson's, Berkson's and the Elevator Paradox explain a lot of life.
A better, more satisfying explanation for Simpson's Paradox comes from Pearl et al.'s work in modern causation theory, in which they point out that the paradox exists only because we fail to consider the causal structure that generated the apparently paradoxical data. If we take that structure into account and are clear about the causal question we are trying to answer, the paradox disappears. For a recent example involving COVID-19, check out this blog post:
I thought it was abundantly obvious that the author meant it in a qualitative way, i.e. we update our certainty as evidence comes in. Not necessarily in the quantitatively correct amount.
Controversial opinion: Bayes Theorem is overrated. In real life usually we have no idea about priors, and we have close to zero chance to get any good estimate of the true probability of something. But we can still get by fine for the most part, by focusing on limiting possible loss and staying on the safe side with large margins.
Many of the claimed cognitive biases go away under this view. One textbook example of Bayes theorem is how doctors overestimate the probability of being positive for a disease. But what are the priors? Maybe those who visit the doctor did something risky the day before or are feeling funny. Maybe the cost of false positive is negligible compared to the cost of a false negative, etc. People are less stupid than what the TED talk crowd claims.
It's a cheap trick to start an argument with "controversial opinion" or any other similar phrase.
The funny truth in this case is that it's not only cheap, but a factual counterpoint to your argument:
By stating that what is to follow is controversial, you give a prior to reading your argument. So that when the reader evaluates it, he already does so from the perspective that it's controversial and thus one shouldn't be too harsh in criticizing it further. This is the real life application of the Bayes theorem from the author of the linked article.
You see? You say it's overrated. But you use it anyway.
So the next time you try to shield yourself from critique, try to build a better argument.
> when the reader evaluates it, he already does so from the perspective that it's controversial
He does so from the perspective that the author believes it's controversial. If you are required to actually assume the opinion is controversial because the author said so, I'd start every paragraph with "you owe me money".
How is explicating the prior an advantage? If the prior is arbitrary anyways you could also stick to your unknown prejudice. This shouldn't change any results and if it does you are in trouble anyways, no matter if you explicitly state your prejudice. I'm still suspecting that Bayesian statistics is just kind of a hack to make results look more convincing.
But, this might be negative, because you can’t consciously tweak an unknown prejudice. But, you can tweak a prior until your results support your hypothesis. In that sense, Baysian statistics might be more transparent, but less honest.
True, but the question is if transparency is desirable. I would say it is dangerous for three reasons. First, you might be tempted to tweak your prior until your posterior confirms your hypothesis. Second, using Bayesian reasoning, you make it seem that the first procedure is justified. And third, if everyone does the tweaking for example within in a scientific community, nobody would complain, since everyone automatically would confirm their hypothesis with higher posterior probability.
If by prior you just mean "I know something about it" or "everything happens in context"; then that fine. But if thats what you mean then a diminishingly small number of events have "priors" which can be expressed in a neat analytical form, or be approximated, or even be quantified. This is part of the problem of frame and context that ML v1.0 tried hard to solve.
Recall as well that in the Bayesian approach the model itself is not subject to Bayesian updating: its part of your prior. Except that you never update it. So youre not merely choosing how to update parameters given data; you're also choosing what you're not going to update.
There is always a prior only if you really care about computing probabilities. The implicit assumption in Bayesian data analysis is that you go first to "best possible estimate of probability", then to "decision based on that". My point was that you usually need not do the first step.
Example: I wear a bicycle helmet because it costs me next nothing and it possibly saves my life. I don't do any Bayesian analysis implicitly or explicitly, because on one side there is an outcome with value minus infinity, so it hardly matters what probability I multiply it with.
You don't need to think hard about massively asymmetric payoffs.
Now what if you needed a something like a $5k licence to wear the helmet. Would you feel like thinking harder and analysing further than you did? Most interesting decisions are more like this.
"Possibly saves my life" is your prior there, btw.
Bayes Theorem is one of the most fundamental theorems in the history of mathematics. I have yet to work in a field where it doesn't have deeply fundamental applications. In many cases expert knowledge or heuristic rules serve as prior.
Saying it is overrated is like saying sun or air is overrated.
But hyperbole aside, OP also has a point. If we forget that the estimation of probability in itself has a cost, we could be tempted to put more and more resources into more and more sophiticated methods of data collection and analysis to be more and more certain of your estimate. But if we remember that this process has a cost, some times it's more efficient to just add a margin of safety and move on with your life. Bayes theorem is often used for resource allocation, but the process of optimizing resource allocation in itself has a cost.
I agree with the first part. We get by fine for the most part by our own intuitions driven by fear and risk aversion. We are constantly triggered into action, not persuaded -- not by ourselves or anyone. But I think the blog here is a call to be more rational. I would consider Bayes another tool out of many. Unfortunately that doesn't change how we are though. We're still hungry and trigger driven at the end of the day.
Which is why I disagree with the other thing you said. People are pretty stupid. To think you know anything without prior research is stupid. Priors need to be deliberately created (act of learning and understanding and internalizing) for a guess to be educated. Anything without we default to stupid, so most of us are stupid with most things.
But just getting by is an incredibly low bar. We have been tested however with the covid situation. But take covid. Mask wearing isn't a "priors" issue. It's the understanding of what a mask does and how it influences risk that is important. You don't need authority to understand the benefits of a mask, though once understood, it would definitely fall under "staying on the safe side with large margins".
That's an interesting response, thanks! I think where I disagree is that I think people are pretty smart, at least in one thing, which is survival -- the proof of that is that those who were not, quickly exited the gene pool. That is a powerful filtering that tunes our estimators.
I agree with your last paragraph, but I think it supports my point. I wear a mask because it has zero cost, and it may save my life. When I took this decision, I didn't estimate any probabilities and I haven't used Bayes theorem. Understanding what a mask does exactly and how aerosol transmission of viruses work precisely is almost irrelevant to my decision -- I could be improving my knowledge ("my priors") by studying virology, but there would still be so many uncertainties, that it would hardly influence my decision.
> In real life usually we have no idea about priors
Priors are your previous knowledge on the topic.
> One textbook example of Bayes theorem is how doctors overestimate the probability of being positive for a disease. But what are the priors?
In this example, doctors overestimate precisely because they don't take the priors into account.
Doing something risky the day before / feeling funny is extra evidence that is assimilated (or should be) into the likelihood ratio P(D|H) / P(D). This is information the patient should share with the doctor.
Of course, if they don't, then the Bayes estimate is the best guess given all the information the doctor has.
Edit: Your criticism about how we choose priors is fair. The better you are at this, the more accurate your answers become. I mention more about this in the "putting it to practice" section.
I agree with you, but my point is more broadly that in reality we often don't go through the steps "1. estimate probability" -> "2. make a decision based on the probability distribution", because step 1. is so error-prone and intractable, that we typically jump directly to step 2. and try to limit our downside.
Of course you could look back and say, given the fact that I took some decision, what would have been my prior if I had used Bayes theorem, but my point is that we don't actually use it for taking the decision.
While I think it's fair to say that it's hard to come up with informative priors for many real problems, the Bayesian framework is pretty robust if you use weakly informative priors
Others have already raised this point... but let me try to reiterate. The problem of getting priors is not just one of "acquiring more information". In many cases it's not even clear what such a probability means. For example, you believe that Trump is the 45th POTUS...and you assign it a prior of 0.8...what does the probability mean in this case? In the case of rolling dice it's clear what each probabilities mean, but not in this case. And in any case, how much should you update your probabilities for any given piece of prior? All of these questions (how to assign priors, how much to update them etc.) are the _crux_ of Bayesianism and Bayesianism itself has little to say about it. The founders of Bayesianism itself were aware of these issues. For a more substantive critique read the following.
Bayes is a good _tool_ but to me it's a very small one and it doesn't and _cannot_ do most of the heavy lifting of how to live my life. Suppose I want to decide what I should do next week, Bayes is close to useless for that. And that is certainly something "critical thinking" should help me with.
Keywords to search for "small worlds vs large worlds Bayes"
The book "Probability Theory" by E.T. Jaynes is absolutely amazing at explaining how to actually think about Bayesian probabilities, how to assign priors, and the implications of different priors and posteriors. He explains it far better than I can, but one of the major points is that the prior becomes increasingly irrelevant with less data. If we have a lot of data, then we can assign just about any reasonable prior and still get accurate results. If we don't have much data, then the prior has a large influence. In this case we can't be as confident in the accuracy of our posterior probability, but we can calculate just how much confidence we can have in it.
Your example is hard to get into because it just doesn't make sense. The probability of Trump being the 45th President is 1. Garbage in, garbage out applies here.
If instead we step back to 2016 and say that we know the next president will be blue or red, then it's reasonable to assign the maximally uninformative prior of 0.5 to each outcome. Each additional piece of information modifies the probability. If we learn that the red candidate has been caught on tape speaking about groping women, we would calculate a new, lower posterior probability. How much lower depends on what we can determine about how much this hurts his election chances. We will get better results if we can estimate this factor accurately. This won't be much of an issue if we have many other pieces of data so that this one piece doesn't have a huge influence, but if this is our only piece of data then its accuracy will have a huge effect on the accuracy of our conclusion. For the sake of example, let's say that we our best information provides us with a posterior probability of 0.25. This posterior will be our prior when the next piece of information comes in (i.e., it's no longer an uninformative prior because it now contains past information).
Now suppose we see on TV that Trump has won the election. We don't have any money riding on the exact percentage, so let's just estimate the probability of the TV report being correct at 0.99. If we plug this in to Bayes' Theorem with our prior of 0.25, we get a new posterior of 0.97. We're not very confident in this posterior because we used an uninformative prior with 2 very weak updates, but if we really care about accuracy then we can seek out more and better data and get a far more accurate posterior. Moreover, we can calculate the confidence we have in the posterior based on the prior and the data.
Now in 2020 when the election can no longer be challenged, we update our posterior again, but it's not very interesting. Bayes' Theorem still applies, but since we know for a certainty who POTUS 45 is, the prior (0.97) factors out. We get a posterior probability of 1. Yet even though we've only used 3 pieces of data, we can calculate our confidence in our posterior as being extremely high since one our data points was so strong.
I don't know how much it can help you decide what to do next week, but if part of that decision involves calculating or reasoning about probabilities, than you absolutely should understand Bayes' Theorem. It's helpful to know how to think about it even if you can't assign exact numbers in the same way that understanding geometric or logarithmic curves is useful.
I agree with this, but to play devil's advocate (and something I have not resolved internally yet):
Assume a model states there is a 99% likelihood something will occur.
Now the data changes, and the likelihood drops to 1%.
Was the original model "correct" insofar that there was a 99% likelihood of something occurring (given the information it had at the time)? Or should it have "priced in" the fact that data may change substantially, and 99% was far too overconfident?
How are we supposed to interpret variability in model estimates? Do we throw up our hands and say "the data changed"? Or do we hold the models somewhat accountable, saying - no, you weren't "right at the time, given your data". If your estimates are changing so strongly, you are wrong. A 99% estimate that drops to 1% is simply, undeniably "unreliable."
In this case, we somewhat care about "model robustness", but how does this extrapolate to situations where the data changing _should in fact_ impact the model substantially?
I suspect the answer necessitates a deeper look into the nature of probability, risk and uncertainty.
In general there are rigorous ways to quantify those kinds of model deficiencies. From an informal perspective: if the change in data is small and the change in model output is large, your model is poor (and in particular, possibly overfit).
Formally what you're describing is the bias-variance tradeoff.[1] You can assess this by looking at the conditioning of your model, which measures how sensitive it is to changes.[2] Roughly speaking, the condition number of an estimator (or generally, function) measures how large the change f(x) -> f(y) in the range for the change x -> y in the domain, where x and y are close. That will give you variance. If you try to minimize bias too much, you may overfit your model and it would exhibit high variance in cross validation. If you try to minimize variance, you may fail to capture relations in your underlying data, which would exhibit higher bias.
Practically speaking, for your specific example: if a relatively small change in the sample data resulted in the model adjusting its prediction to 99% from 1%, I would assume your model is severely overfit (I can't quantify exactly how small without more context, but let's agree it's small). From a meta Bayesian perspective, it would take quite a lot of further cross validation for me to drop that belief ;)
Yep. What you're getting at in a sense is "Knightian uncertainty", the presence of unknown unknowns (aka "black swans"). And the heart of the matter is certainty, which we cannot hope to achieve. So what exactly do these tools for handling risk help us achieve? I think this is still a question of active investigation and at any rate, if good answers do exist for it, they certainly haven't filtered down to the common masses.
"It is very important to understand the following point. Probability theory always gives us the estimates that are justified by the information that was actually used in the calculation. Generally, a person who has more relevant information will be able to do a different (more complicated) calculation, leading to better estimates. But of course, this presupposes that the extra information is actually true. If one puts false information into a probability calculation, then the probability theory will give optimal estimates based on false information: these could be very misleading. The onus is always on the user to tell the truth and nothing but the truth; probability theory has no safety device to detect falsehoods."
G. L Bretthorst, "Baysian Spectrum Analysis and Parameter Estimation," Springer-Verlag series 'Lecture Notes in Statistics' # 48, 1988, p30-31.
I think there is something to be said for when one first discovers Bayes; not least because its often presented like this article as a sort of enlightenment. Perhaps the finest work on the topic is probability theory by Jaynes; if the underpinning of the Bayesian sect was to have a bible that would be it and BDA3 would be its practical counterpart. I say this both owning and having read both and having invested a a lot of time purveying, learning and applying the Bayesian perspective.
What Bayes makes possible, from an applied statistics perspective, is a kind of unification of a large variety of modelling approaches into a single framework fitted in the same way: simple regression yes, but also hierarchical models, mixtures, pooled, partially pooled, hurdle, regularised, horse shoe, and so on. So when you learn Stan or PyMC3 or Nimble or whatever, you're enabled to go forth and make myriad custom models: this is powerful, and it is enough to respect Bayes.
Various results show that lots of other models can in principle be expressed in a Bayesian way given the right prior; hinting in some theoretical sense that Bayes is a universal modelling approach: the panacea you've been looking for young statistician.
However, Bayes has many epistemological problems and for the actually interested reader see here for a summary:
For those, like myself, that are out there (like Breiman was) trying to actually solve real world problems; you find yourself quickly limited by the Bayesian approach. The data generating process of the real world is totally not obivious MOST of the time but you are forced as a Bayesian to pretend otherwise. As much as I love problems where Bayes does work well; they are fairly few and far between for me.
So as a closing comment; lets not "Bayes all the things", as tempting as it is. It is in many respects the first part of the journey for many avid evangelists and self confessed "how I became a Bayesian" converts, but its not the end all.
> The mind doesn’t always work like Bayes Theorem prescribes. There’s lots of things Bayes can’t explain. Don’t try to fit everything you see into Bayes rule. But wherever beliefs are concerned, it’s a good model to use. Find the best beliefs and calibrate them!
The one concern I'm really interested by, and don't understand, is this:
> There are thus lots of reasons to believe that we do not think and should not think in a Bayesian way.
Can you give an example that can serve as motivation to read the entire paper linked?
Section 6.2 gives a very concise list of problems.
For further reading have a go at para-consistent logic for more total upheaval of the Jaynesian view of the centrality of predicate logic as the principal language of science (of which Bayes is a probabilistic relaxation according to Jaynes).
I didn't allege it wasn't a problem for Frequentists in a similar situation. I'd add that the modelling world is very far from partitioned into Bayesians and Frequentists.
I think you're quite right about that which is not to say Bayes can't be very useful but I do agree that it has been elevated to the level of religion.
Yeah. Further context, for other readers: it's important to distinguish between Bayesianism the intellectual movement and Bayesian modeling as it exists in the professional and academic statistical community.
There are communities which pick up Bayes Theorem in isolation (i.e. without any other statistical education or knowledge) and apply it to every debate and discussion under the Sun, often pontificating about how rational they're being about "updating their priors." But using Bayes theorem qualitatively instead of quantitatively - and without regard for its pitfalls - often leads to confirmation bias in which someone lends a veneer of rigorous probability to something they already unconsciously believed anyway. It's also sometimes used to dress up pseudoscience with intellectual polish, and it takes work to unpack those errors.
Bayesian probability is not a panacea, and does not displace frequentist probability. The two paradigms are fully compatible (you can move from one to the other via inversion of the estimator function). But they answer different questions, both of which are important. It is also easy to make statistical errors in both paradigms.
I am confused by the criticisms "a superimposed set of abstractions. Nothing more." and "nothing to do with its processes". Surely that is exactly what this kind of framework attempts to achieve? ie ways of abstracting to make better decisions when the problem is too complex or soft for process-level understanding.
I think the op is saying, unlike its often presented, by learning Bayes youre not "discovering the truth about how to update beliefs from data". I.e. its not a panacea, its a framework with strengths and weaknesses.
Recall, most statisticians are not Bayesian. Most science is not Bayesian and its not because everyone is naive or just too stupid to see. The Bayesian approach is well known and long standing but it takes using it in anger to see what it is good for and what it is not.
We update our beliefs based on new information, but the whole point of Bayesinaism via Bayes Theorem is updating it by a very specific amount based on evidence strength. Nobody is approximately Baysian in their thought process, and I doubt most people can even be trained to be. Statistics is a pen & paper exercise for the most part.
In my experience, which is not negligible, is the hardest part of statistics is talking people down from beliefs they have settled on because of something that looks like statistical evidence, but in fact is not.