If Google is using those pictures as captchas, it is probably because they could not write an algorithm to decipher them.
If, in the worst case, this results in spammers creating even better pattern recognition algorithms than the current cutting edge, it is certainly worth all the effort.
User responses to the pictures are not checked for correctness. Every time you see a reCAPTCHA, you can just enter a garbage value for the word that is less distorted and the system will accept it.
It's not always the less distorted one. As for data integrity, presumably Google shops each word out to multiple users so over time they get an idea of what the proper response is.
exactly, if they do it 100 times, they should get a very pointed distribution and they should know which distributions are hard for humans to read based on the way the skewness is and the approximate location of the number in their GPS database.
They could get a ton right and very, very few wrong with this system.
The real question is, how does their system identify numbers in the photo without actually knowing what the numbers are?
I'd guess that they can detect that it's a house number (look for oval shapes, or stuff that looks like characters in the usual spot on a house, etc.) but not be highly confident what the exact number depicted is.
If, in the worst case, this results in spammers creating even better pattern recognition algorithms than the current cutting edge, it is certainly worth all the effort.